Comment by jsheard

1 day ago

That and they were harvesting data way before it was cool, and now that it is cool, they're in a privileged position since almost no-one can afford to block GoogleBot.

They do voluntarily offer a way to signal that the data GoogleBot sees is not to be used for training, for now, and assuming you take them at their word, but AFAIK there is no way to stop them doing RAG on your content without destroying your SEO in the process.

But they also collect the data without causing denial of service, and respect robots.txt, which is more than you can say of most LLM scrapers...