Comment by edg5000

4 hours ago

Residential proxies are the only way to crawl and scrape. It's ironic for this article to come from the biggest scraping company that ever existed!

If you crawl at 1Hz per crawled IP, no reasonable server would suffer from this. It's the few bad apples (impatient people who don't rate limit) who ruin the internet for both users and hosters alike. And then there's Google.

One thing about Google is that many anti-scraping services explicitly allow access to Google and maybe couple of other search engines. Everybody else gets to enjoy CloudFlare captcha, even when doing crawling at reasonable speeds.

Rules For Thee but Not for Me

  • > many anti-scraping services explicitly allow access to Google and maybe couple of other search engines.

    because google (and the couple of other search engines) provide enough value that offset the crawler's resource consumption.