Comment by lobsterthief

8 hours ago

For Google, just read their publicly-published list of crawler IPs. They’re broken down into 3 JSON files by category. One set of IPs is for GoogleBot (the web crawler), one is for special requests like things from Google Search Console, and one is special crawlers related to things like Google Ads.

You can ingest this IP list periodically and set rules based on those IPs instead. Makes you not prone to the blackhat SEO tactic you mentioned. In fact, you could completely block GoogleBot UA strings that don’t match the IPs, without harming SEO, since those UA strings are being spoofed ;)