← Back to context

Comment by TZubiri

2 years ago

"how to scrape an ip geolocation database"

You know you can just run a whois query per ip you want to analyze, no point in scraping the whole ipvN space.

I have to scrape the whole IP address space since I offer location information as part of my API.

Also I only need to scrape as many WHOIS records as there are different networks out there. So for example for the IPv4 address space, there are much less networks as there are IPv4 addresses (2^32).

Also, most RIR's provide their WHOIS databases for download.

Therefore, "scraping" is not really the correct word, it's an hybrid approach, but mostly based on publicly available data from the five RIR's.

whois has no sane format.

  • RDAP is run by all the RIR and is Json and has all the whois data except IRR.

    And it does 302 redirect to best source.