Comment by Ndymium
16 hours ago
I had this same issue recently. My Forgejo instance started to use 100 % of my home server's CPU as Claude and its AI friends from Meta and Google were hitting the basically infinite links at a high rate. I managed to curtail it with robots.txt and a user agent based blocklist in Caddy, but who knows how long that will work.
Whatever happened to courtesy in scraping?
> Whatever happened to courtesy in scraping?
Money happened. AI companies are financially incentivized to take as much data as possible, as quickly as possible, from anywhere they can get it, and for now they have so much cash to burn that they don't really need to be efficient about it.
not only money, but also a culture of "all your data belong to us" because our ai going to save you and the world.
the hubris reminds me of dot-com era. that bust left a huge wreckage. not sure how this one is going to land.
It's gonna be rough. If you can't make money charging people $200 a month for your service then something is deeply wrong.
Need to act fast before the copyright cases in the court gets handled.
> Whatever happened to courtesy in scraping?
When various companies got signal that at least for now they have a huge overton window of what is acceptable for AI to ingest, they are going to take all they can before regulation even tries to clamp down.
The bigger danger, is that one of these companies even (or, especially) one that claims to be 'Open', does so but gets to the point of being considered 'too big to fail' from an economic/natsec interest...
Mind sharing a decent robots.txt and/or user-agent list to block the AI crawlers?
Linked upthread: https://github.com/ai-robots-txt/ai.robots.txt/blob/main/rob...
Any of the big chat models should be able to reproduce it :)
When will the last Hacker News realize that Meta and OpenAI and every last massive tech company were always going to screw us all over for a quick buck?
Remember, Facebook famously made it easy to scrape your friends from MySpace, and then banned that exact same activity from their site once they got big.
Wake the f*ck up.
The same thing that happened to courtesy in every other context: it only existed in contexts where there was no profit to be made in ignoring it. The instant that stopped being true, it was ejected.