Comment by garganzol

1 month ago

Nowadays people complain about AI scrapers with the same vain as they complained about search indexers a way back when. Just a few years later, people had stopped caring too much about storage access and bandwidth, and started begging search engines to visit their websites. Every trick on the planet Earth, SEO optimization, etc.

Looking forward to the time when everybody suddenly starts to embrace AI indexers and welcome them. History does not repeat itself but it rhymes.

7 comments

garganzol

phyzome 1 month ago

We already know the solution: One well-behaved, shared scraper could serve all of the AI companies simultaneously.

The problem is that they're not doing it.

garganzol 1 month ago
This is an interesting approach. Archive.org could be such a solution, kind of. Not its cold storage as it's now, but a warm access layer. Sponsorship by AI companies would a good initiative for the project.
- phyzome 1 month ago
  
  I can't imagine IA ever going for it. You'd need a separate org that just scrapes for AI training, because its bot is going to be blocked by anyone who is anti-AI. It wouldn't make sense for it to serve multiple purposes.
  Common Crawl would be a better fit, but still might not want to serve in that capacity.

what 1 month ago

Bad take. Search engines send people to your site, LLMs don’t.

crazygringo 1 month ago

I visit sites and pages through links I get from an LLM plenty.

linkregister 1 month ago

Search indexing historically has had several of orders less impact on bandwidth and processing costs to website maintainers.

My recommendation is to copy the text in this article and pass it LLM to summarize this article's key points, since it appears you missed the central complaint of the article.

Guvante 1 month ago

Except robots.txt was the actual real solution to search indexing...