Comment by eli
16 days ago
Seems like an open question as to whether that violates any laws.
Another way to look at it is that if you publish a service on the web, you have limited rights to restrict what people do with it.
Isn't that the logic Google search relies on in the first place? I didn't give permission for Google to crawl and index and deep link to my site (let alone summarize and train LLMs on it). They just did it anyway, because it's on a public website.
Google's stance is "I can copy you and you can't stop me" as well as "You can't copy me, I'll sue you"
Maybe it has changed but Google doesn't look like it uses litigation as its primary weapon. It defends itself but rarely attacks.
The are however more than happy to use technical measures, like blocking accounts. And because of their position, blocking your Google account may be more damaging than a successful lawsuit.
Google at least claims that noindex will keep your site from getting crawled [1]. Do people think this is false?
[1] https://developers.google.com/search/docs/crawling-indexing/...
Strictly speaking no, that doesn’t prevent crawling - at the least Googlebot has to fetch the page to see the meta tag or the robots.txt to see what’s allowed, and it will periodically recheck for changes.
It doesn’t even prevent indexing. If a page is linked from elsewhere, Google will show it in search results even if noindex’d.
And why does Google get to set these rules on my site anyway? I didn’t agree to them.