Comment by reincoder

2 years ago

First, I am big fan of your articles even before I joined IPinfo, where we provide IP geolocation data service.

Our geolocation methodology expands on the methodology you described. We utilize some of the publicly available datasets that you are using. However, the core geolocation data comes from our ping-based operation.

We ping an IP address from multiple servers across the world and identify the location of the IP address through a process called multilateration. Pinging an IP address from one server gives us one dimension of location information meaning that based on certain parameters the IP address could be in any place within a certain radius on the globe. Then as we ping that IP from our other servers, the location information becomes more precise. After enough pings, we have a very precise IP location information that almost reaches zip code level precision with a high degree of accuracy. Currently, we have more than 600 probe servers across the world and it is expanding.

The publicly available information that you are referring to is sometimes not very reliable in providing IP location data as:

- They are often stale and not frequently updated.

- They are not precise enough to be generally useful.

- They provide location context at an large IP range level or even at organization level scale.

And last but not least, there is no verification process with these public datasets. With IPv4 trade and VPN services being more and more popular we have seen evidence that in some instances inaccurate information is being injected in these datasets. We are happy and grateful to anyone who submits IP location corrections to us but we do verify these correction submissions for that reason.

From my experience with our probe network, I can definitely say that it is far easier and cheaper to buy a server in New York than in any country in the middle of Africa. Location of an IP address greatly influences the value it can provide.

We have a free IP to Country ASN database that you can use in your project if you like.

https://ipinfo.io/developers/ip-to-country-asn-database

Big fan of what articles? On https://incolumitas.com/ or on https://ipapi.is/?

Great idea with latency triangulation, I used latency information for a lot of things, especially VPN and Proxy detection.

But I didn't assume you can obtain that accurate location. I am honestly impressed. But latency triangulation with 600 servers gives some very good approximation. Nice man!

Some questions:

- ICMP traffic is penalised/degraded by some ISP's. How do you deal with that?

- In order to geolocate every IPv4 address, you need to constantly ping billions of IPv4's, how do you do that? You only ping an arbitrary IP of each allocated inetnum/NetRange?

- Most IP addresses do not respond to ICMP packets. Only some servers do. How do you deal with that? Do you find the router in front of the target IP and you geolocate the closest router to the target IP (traceroute)?

  • https://incolumitas.com/

    This is my all-time favorite article: https://incolumitas.com/2021/11/03/so-you-want-to-scrape-lik...

    I used to do freelance web scraping, and that article felt like some kind of forbidden knowledge. After reading the article, I went down the rabbit hole and actually found a Discord server that provided carrier-grade traffic relay from a van which contained dozens of phones.

    For the questions..... we have to kinda wait a bit, someone from our engineering team might come here and reply.

    By the way, as I have you here have you considered converting the CSV files to MMDB format? I was planning to do that with our mmdbctl tool later today.

    https://github.com/ipinfo/mmdbctl

  • I'm very curious why you'd do VPN/proxy detection...

    But at a previous company I worked at that ran a very large chunk of the internet, we did indexing of nearly the entire internet (even large portions of the dark web) approximately every two weeks. There were about 500 servers doing that non-stop. So, I think it is relatively reasonable if you have 600 servers to do that.

    • In the business of media streaming, rightholder will require that you check for vpn and proxies in addition to countries when deciding if a given viewer will be able to stream a given media.

      8 replies →

  • You can guess pretty well how IP's are related by BGP announcements, so as long as a few per block and if small, ASN. You can use that logic.

Great comment. I'm a big fan and customer of IPinfo, using your API in our login notification emails to say "You just logged in from Berlin, Germany. If this wasn't you click here." To provide country data for customers in their audit logs. And for anti-spam and fraud detection.

  • I appreciate it, sir! If you have any questions or feedback, please let us know.

    The challenge of being a data provider is that you can use our data in a million ways, and we don't have coverage of all. So, when you come up with questions or ideas, we can help you better.

    As you mentioned, audit logs. I highly recommend you look into the ASN field.

    The ASN identifies an organization that owns a block of IP addresses. In my experience, I have found that the combination of ASN+Country is the most valuable information you can use in spam and fraud detection. You can fake the IP geolocation information with a VPN. However, it is not as easy to fake the ASN information of the IP address. So, when you use a combination of country + ASN, you can have a robust cybersecurity system.

    • Can you explain more how to use ASN to detect fraud and how it's different from the country detected for the IP? I thought ASN was derived from the IP, basically the route to that IP? Here's the ipinfo response for an IP used by a recent fraud signup attempt. The asn field matches country.

        {
          "city": "Mumbai",
          "connection": {
            "asn": 24560,
            "isp": "Bharti Airtel Ltd."
          },
          "continent_code": "AS",
          "continent_name": "Asia",
          "country_code": "IN",
          "country_name": "India",
          "currency": {
            "code": "INR",
            "name": "Indian Rupee",
            "plural": "Indian rupees",
            "symbol": "Rs",
            "symbol_native": "\u099f\u0995\u09be"
          },
          "ip": "2401:4900:1f38:7402:5569:2e45:3bb:9c0d",
          "latitude": 19.076000213623047,
          "location": {
            "calling_code": "91",
            "capital": "New Delhi",
            "country_flag": "https://assets.ipstack.com/flags/in.svg",
            "country_flag_emoji": "\ud83c\uddee\ud83c\uddf3",
            "country_flag_emoji_unicode": "U+1F1EE U+1F1F3",
            "geoname_id": 1275339,
            "is_eu": false,
            "languages": [
              {
                "code": "hi",
                "name": "Hindi",
                "native": "\u0939\u093f\u0928\u094d\u0926\u0940"
              },
              {
                "code": "en",
                "name": "English",
                "native": "English"
              }
            ]
          },
          "longitude": 72.87770080566406,
          "region_code": "MH",
          "region_name": "Maharashtra",
          "time_zone": {
            "code": "IST",
            "current_time": "2023-09-15T10:52:42+05:30",
            "gmt_offset": 19800,
            "id": "Asia/Kolkata",
            "is_daylight_saving": false
          },
          "type": "ipv6",
          "zip": "400203"
        }
      

      Here's the response from ipinfo.io which includes privacy fields. It's technically a proxy but might be hard to detect because it's probably a crowdsourced/botnet proxy not a public one. We don't pay for

        {
          "ip": "2401:4900:1f38:7402:5569:2e45:3bb:9c0d",
          "city": "Najafgarh",
          "region": "Delhi",
          "country": "IN",
          "loc": "28.6114,77.2982",
          "org": "AS24560 Bharti Airtel Ltd., Telemedia Services",
          "postal": "110097",
          "timezone": "Asia/Kolkata",
          "asn": {
            "asn": "AS24560",
            "name": "Bharti Airtel Ltd., Telemedia Services",
            "domain": "airtel.com",
            "route": "2401:4900:1f38::/48",
            "type": "isp"
          },
          "company": {
            "name": "ABTS (Karnataka),",
            "domain": "airtel.com",
            "type": "isp"
          },
          "privacy": {
            "vpn": false,
            "proxy": false,
            "tor": false,
            "relay": false,
            "hosting": false,
            "service": ""
          },
          "abuse": {
            "address": "Bharti Airtel Ltd., ISP Division - Transport Network Group, 234 , Okhla Industrial Estate,, Phase III, New Delhi-110020, INDIA",
            "country": "IN",
            "email": "ip.misuse@airtel.com",
            "name": "ABUSE BHARTIIN",
            "network": "2401:4900:1f30::/44",
            "phone": "+000000000"
          }
        }
      

      EDIT: Oops, I confused ipinfo with ipstack. I'm actually using ipstack. Their security field also doesn't detect this IP as a proxy, which is why we only pay for Professional (no security field).

        {
          "ip": "2401:4900:1f38:7402:5569:2e45:3bb:9c0d",
          "type": "ipv6",
          "continent_code": "AS",
          "continent_name": "Asia",
          "country_code": "IN",
          "country_name": "India",
          "region_code": "MH",
          "region_name": "Maharashtra",
          "city": "Mumbai",
          "zip": "400203",
          "latitude": 19.076000213623047,
          "longitude": 72.87770080566406,
          "location": {
            "geoname_id": 1275339,
            "capital": "New Delhi",
            "languages": [
              {
                "code": "hi",
                "name": "Hindi",
                "native": "\u0939\u093f\u0928\u094d\u0926\u0940"
              },
              {
                "code": "en",
                "name": "English",
                "native": "English"
              }
            ],
            "country_flag": "https://assets.ipstack.com/flags/in.svg",
            "country_flag_emoji": "\ud83c\uddee\ud83c\uddf3",
            "country_flag_emoji_unicode": "U+1F1EE U+1F1F3",
            "calling_code": "91",
            "is_eu": false
          },
          "time_zone": {
            "id": "Asia/Kolkata",
            "current_time": "2023-09-15T12:27:08+05:30",
            "gmt_offset": 19800,
            "code": "IST",
            "is_daylight_saving": false
          },
          "currency": {
            "code": "INR",
            "name": "Indian Rupee",
            "plural": "Indian rupees",
            "symbol": "Rs",
            "symbol_native": "\u099f\u0995\u09be"
          },
          "connection": {
            "asn": 24560,
            "isp": "Bharti Airtel Ltd."
          },
          "security": {
            "is_proxy": false,
            "proxy_type": null,
            "is_crawler": false,
            "crawler_name": null,
            "crawler_type": null,
            "is_tor": false,
            "threat_level": "low",
            "threat_types": null
          }
        }

      1 reply →

Have you considered making your database available for download as Parquet format so people could just copy the file to S3, Google Cloud, etc, and query it immediately with various tools?

I know it can be done with CSV but it's not as smooth.

  • Thank you for the feature request.

    We usually just send users the documentation of ingesting the data in CSV or NDJSON format (Newline Delimited JSON). We don't actually get many requests for data downloads in Parquet format. I think we have a few customers where we deliver the data in parquet format directly to their cloud storage bucket.

    But keep an eye out for our emails if we announce the parquet data downloads. I will talk with the folks about this.

    BUT, there are some good news.

    At least for the free database, we deliver the data directly to data warehouse platforms. Not even storage buckets. And we supply a good amount of documentation.

    We have the free database in Snowflake, GCP, Kaggle, and Splitgraph, and we are working on a few more deals. For the free database, atleast, we are working on better things than parquet. Like literally one-click solution to bring the IP data to your data warehouse.

    Kaggle: https://www.kaggle.com/code/ipinfo/ipinfo-ip-to-country-asn-...

    Snowflake: https://app.snowflake.com/marketplace/listing/GZSTZSHKQ4QY/i...

    If you want to use our free IP database on Google Cloud or BigQuery, please send us an email (support@ipinfo.io) and mention that the DevRel sent you from HN. I can easily set you up with the free IP database in GCP/BQ.

Your comment is extremely interesting and what I was hoping to learn from the article (without an existing source of information, how do we determine the location of an IP address). Thank you!

  • I really appreciate. Thank you. We are very transparent about our process. If you have any questions, you can always reach out to us.

    We have a simplified explanation of our probe network here: https://ipinfo.io/blog/probe-network-how-we-make-sure-our-da...

    The only update is the number of servers is like 600+ now. The probe network is growing extremely rapidly.

    Our IP geolocation process is quite complicated, and we have a team of data engineers, infrastructure engineers, and data scientists working on various aspects of it. Therefore, our approach is users can ask us questions, and we will try our best to answer them.

    • Just wanted to let you know, it's this transparency that turned me into a customer!

      I love your company and service, but I hate your pricing. I work with a lot of small clients/apps that paying for usage would be a no-brainer, but the defined monthly price buckets don't make any economical sense at their scale. If you added a "pay as you go" tier that a small app could reasonably start by using dollars worth of API calls per month and grow from there, I'd be spreading your seed all over the place. I'm not saying this to rag on you, just trying to provide some constructive feedback as a thank you for your info sharing!

      4 replies →

I just noticed that my wifes iphone uses the same mycingular ip address while driving accross 3 states over 5 hours.l while checking mail.

  • There's several options/techniques for doing it. But just imagine you have a permanent zero overhead VPN.

    I don't know if that provider terminates long running calls, but the calls would stay up too regardless of tower.

Would you consider no-signup inspection of the data you hold on the requesters IP address? I would love to see what you have on MY IP address, and if sufficiency accurate it feels that it would be a good incentive to sign up to use commerically.

It feels like it couldn't be abused by 'freeloaders', because i'd guess their use-case is viewing other peoples.

  • We have a very open approach to our data. In fact, our website is extremely accessible. It is quite useful for researching IP addresses and does not require signing up. The data is largely available to view on the website. Although we display all IP address meta data on the home page, if you intend to use our website frequently, I recommend utilizing the IP data pages.

    You can enter IP addresses on the right side to look up information here: https://ipinfo.io/what-is-my-ip

    Additionally, we offer some enjoyable tools that you can use here: https://ipinfo.io/tools

    The CLI tool is particularly entertaining.

    You can also use our API service without signing up, with a limit of 1000 requests per day.

    If you do choose to sign up for a free account, you will receive 50,000 requests per month, free IP databases, a bulk lookup feature, and more.

  • This is literally the most prominent thing on the https://ipinfo.io home page.

    • That's embarrassing for me... I thought that was a static image of an example. And I did look through the site looking for a search. Oops.

    • Huh, that's cool. It got my home IP about 15 miles from where I am, but still not bad.

      Wait - how does this work for cell IPs? A lot of cellphone v4 IPs are now shared between hundreds or thousands of devices, right?

      1 reply →

Not gonna lie, this creeps the heck out of me.

  • Thousands of people live in a zip code, while hundreds and thousands of people live in a city. We are literally giving away that data for free through our API and database. The creepiness of IP geolocation is mostly a meme.

    IP geolocation is mainly used in cybersecurity and marketing analytics. There are many ways to geolocate someone. I once came across a project that could estimate the country a user is from based on their writing style and grammar mistakes. For example, American people sometimes use "should of" instead of "should have". Knowing the geolocation of an IP address isn't super creepy. It's just how things work on the internet.

    • And you're literally advertising this project as being helpful for targeted ads. So it's pretty clear from the get go that what you consider creepy isn't what I consider creepy. And having done enough reidentification work to scare myself, "thousands of people" might as well be a couple dozen or less. I get why you're defensive and why you think it's not creepy, but calling it a "meme" is insultingingly dismissive.

      Just because it's "how things work on the internet" doesn't make its mass collection right. Under the same logic, any side channel attack is just "how it works", and its abuse warrants no ethical question.

      9 replies →

  • You might want to unplug your router then. A conceit of being connected to a network is you're connected to the network. If you can see other nodes they can see you.

    • Is this an accepted usage of the word `conceit`? I love the construction, and it does feel like it belongs, but I'm not finding this usage. https://www.wordnik.com/words/conceit has a bunch of meanings collected from various sources.

      I wonder if you meant `concession`.

      Also, it's a false dichotomy. One can use VPN or proxies, to limit exposure or to encapsulate it. Of course, you can't get perfect location privacy.

  • Together with the tons of data leaked by browsers it makes it very easy to track people across places and devices.

Can your probes be identified and blocked?

  • It is just ping data. We ping an IP address, get the RTT, draw a radius on the globe, and say that the IP could be anywhere inside that radius. Then we do another ping and draw another radius, and at the cross-section of the two radii could be your IP address. Now, if we do it enough times, we can get an estimate of where the IP address is located.

    The data is not derived from the IP address itself, but rather from the process itself. And it's just a ping. Moreover, the majority of the IP addresses are not pingable. So, we rely on other in house statistical and scientific models to estimate the location. The probe infrastructure is extremely complicated and there are billions and billions of IP addresses, which is why we do not have a robust range filter mechanism.

    You can implement a dynamic ping blocking mechanism or use our data to find hosting ASNs and block ranges of those ASNs. You can download the database for free: https://ipinfo.io/developers/ip-to-country-asn-database

  •     iptables -A INPUT -p icmp -j DROP

    • This doesn't help. Even if you apply this at your router, you are locatable up to your ISP. Which is generally close enough.

      Maybe if you delay pings by some amount (20ms? 100ms?), or randomize the delay, you can do a lot better at masking location.

    • Indeed. Openwrt for some reason defaults to reply to pings. I see the value of ICMP for servers, but I don't see the value for home ISP routers.

      I disabled ICMP reply on my home router.

      2 replies →

Hi, cool idea with the geolocation via latency.

But I encountered 2 things using ipinfo: Hetzner Server that are in Germany in a fixed location that never moved are sometimes located in another country, for me it was once s Server placed into Moscow and once in South America.

How does this happen?

  • If you can give me some information about the IP address or the IP range, I can take a closer look.

    I guess it is because of IPv4 trading or IP address shuffling.

    As far as I know, Hertzer, like many hosting companies, is buying IPv4 addresses around the world. Here is an article on the IPv4 trades:

    https://tech.marksblogg.com/ipinfo-free-ip-address-location-...

    When a company buys an IP address block or relocates an IP block from one of its data centers to another, the location of those IP addresses changes.

    If your IP address is static, but we have made an error in geolocation, I would love to take a closer look. You can email our support (support@ipinfo.io) and send a link to the comment. We can discuss it further from there.

Are there any historical sources for geo ip info?

  • We don't have any free data for that. We have historical data that we sell as part of our custom enterprise deals. Historical data requests are rare, though.

    A time series IP database requires a substantial amount of storage and computational cost to query, as I imagine. The city level geolocation data we have is ~1.5 gb in size. IP range data is complicated to query efficiently as you need to understand data platform settings and good amount of computer network math and computer science stuff. Adding a layer of time series complexities on top of that, makes this process quite difficult.

    To give you some context of how IP metadata lookups work, you can check out this article

    https://ipinfo.io/blog/ip-address-data-in-snowflake/

    Even if you keep all your database in a binary format, the computational cost is still non-negligible.

hm, ipinfo.io tells me that I'm using a VPN even though I'm not...

  • Our VPN recognition is behavior-based. So, there is probably a chance that the IP address you are using is showing some of those behavior patterns.

    A behavior pattern could be that your IP address is being shuffled around random locations that go beyond the normal location shuffling of an ISP connection.

    Also, if your IP range is listed in some public datasets that belong to a VPN service, we could recognize your IP as a VPN.

    Please reach out to our support and let us know about this. Thanks