← Back to context

Comment by account42

2 days ago

No, they are a lazy measure. Most websites that slap on these kinds of checks don't even bother with more human-friendly measures first.

Because I don't have the fucking time to deal with AI scraper bots. I went harder - anything even looking suspiciously close to a scraper that's not on Google's index [1] or has wget in its user agent gets their entire /24 hard banned for a month, with an email address to contact for unbanning.

That seems to be a pretty effective way for now to keep scrapers, spammers and other abusive behavior away. Normal users don't do certain site actions at the speed that scraper bots do, there's no other practically relevant search engine than Google, I've never ever seen an abusive bot hide as wget (they all try to emulate looking like a human operated web browser), and no AI agent yet is smart enough to figure out how to interpret the message "Your ISP's network appears to have been used by bot activity. Please write an email to xxx@yyy.zzz with <ABC> as the subject line (or click on this pre-filled link) and you will automatically get unblocked".

[1] https://developers.google.com/search/docs/crawling-indexing/...

  • > Normal users don't do certain site actions at the speed that scraper bots do

    How would you know when you have already banned them.

    • Simple. A honeypot link in a three levels deep menu which no ordinary human would care about that, thanks to a JS animation, needs at least half a second for a human to click on. Any bot that clicks it in less than half a second gets the banhammer. No need for invasive tracking, third party integrations, whatever.

      1 reply →