Comment by Xeamek
1 year ago
Does google effectively gets a pass, because they (can) use the same bot to index websites for search and to scrap data for AI models training at the same time?
1 year ago
Does google effectively gets a pass, because they (can) use the same bot to index websites for search and to scrap data for AI models training at the same time?
Google does get a pass, since they use Googlebot to scrape content, but then look at the robots.txt for "Google-Extended" to voluntarily decide if they can use said content for LLM training[1].
I assume Microsoft intends to do the same, given they have Bing and their recent stance on the matter[2].
[1] https://developers.google.com/search/docs/crawling-indexing/...
[2] https://www.businesstoday.in/technology/news/story/microsoft...
Google does voluntarily allow robots.txt to be configured such that they will index pages but they promise not to use the content for training, but yeah if Google decided to go rogue then there wouldn't really be anything that site owners could do about it without killing their presence in Googles index.
https://searchengineland.com/google-extended-crawler-432636