Comment by james2doyle
4 hours ago
You call it extortion of the AI companies, but isn’t stealing/crawling/hammering a site to scrape their content to resell just as nefarious? I would say Cloudflare is giving these site owners an option to protect their content and as a byproduct, reduce their own costs of subsidizing their thieves. They can choose to turn off the crawl protection. If they aren't, that tells you that they want it, doesn’t it?
>You call it extortion of the AI companies, but isn’t stealing/crawling/hammering a site to scrape their content to resell just as nefarious?
You can easily block ChatGPT and most other AI scrapers if you want:
https://habeasdata.neocities.org/ai-bots
This is just using robots.txt and asking "pretty please, don’t scrape me".
Here is an article (from TODAY) about the case where Perplexity is being accused of ignoring robots.txt: https://www.theverge.com/news/839006/new-york-times-perplexi...
If you think a robots.txt is the answer to stopping the billion-dollar AI machine from scraping you, I don’t know what to say.
I'm guessing you don't manage any production web servers?
robots.txt isn't even respected by all of the American companies. Chinese ones (which often also use what are essentially botnets in Latin American and the rest of the world to evade detection) certainly don't care about anything short of dropping their packets.
[dead]