Comment by inejge
3 days ago
> Once you start blocking actual users out of your sites, it simply has gone too far.
It has, scrapers are out of control. Anubis and its ilk are a desperate measure, and some fallout is expected. And you don't get to dictate how a non-commercial site tries to avoid throttling and/or bandwidth overage bills.
No, they are a lazy measure. Most websites that slap on these kinds of checks don't even bother with more human-friendly measures first.
Because I don't have the fucking time to deal with AI scraper bots. I went harder - anything even looking suspiciously close to a scraper that's not on Google's index [1] or has wget in its user agent gets their entire /24 hard banned for a month, with an email address to contact for unbanning.
That seems to be a pretty effective way for now to keep scrapers, spammers and other abusive behavior away. Normal users don't do certain site actions at the speed that scraper bots do, there's no other practically relevant search engine than Google, I've never ever seen an abusive bot hide as wget (they all try to emulate looking like a human operated web browser), and no AI agent yet is smart enough to figure out how to interpret the message "Your ISP's network appears to have been used by bot activity. Please write an email to xxx@yyy.zzz with <ABC> as the subject line (or click on this pre-filled link) and you will automatically get unblocked".
[1] https://developers.google.com/search/docs/crawling-indexing/...
> Normal users don't do certain site actions at the speed that scraper bots do
How would you know when you have already banned them.
2 replies →