Comment by garganzol
1 day ago
Nowadays people complain about AI scrapers with the same vain as they complained about search indexers a way back when. Just a few years later, people had stopped caring too much about storage access and bandwidth, and started begging search engines to visit their websites. Every trick on the planet Earth, SEO optimization, etc.
Looking forward to the time when everybody suddenly starts to embrace AI indexers and welcome them. History does not repeat itself but it rhymes.
We already know the solution: One well-behaved, shared scraper could serve all of the AI companies simultaneously.
The problem is that they're not doing it.
This is an interesting approach. Archive.org could be such a solution, kind of. Not its cold storage as it's now, but a warm access layer. Sponsorship by AI companies would a good initiative for the project.
I can't imagine IA ever going for it. You'd need a separate org that just scrapes for AI training, because its bot is going to be blocked by anyone who is anti-AI. It wouldn't make sense for it to serve multiple purposes.
Common Crawl would be a better fit, but still might not want to serve in that capacity.
Search indexing historically has had several of orders less impact on bandwidth and processing costs to website maintainers.
My recommendation is to copy the text in this article and pass it LLM to summarize this article's key points, since it appears you missed the central complaint of the article.
Except robots.txt was the actual real solution to search indexing...
Bad take. Search engines send people to your site, LLMs don’t.
I visit sites and pages through links I get from an LLM plenty.