Comment by Xeamek

2 years ago

Does google effectively gets a pass, because they (can) use the same bot to index websites for search and to scrap data for AI models training at the same time?

3 comments

Xeamek

supriyo-biswas 2 years ago

Google does get a pass, since they use Googlebot to scrape content, but then look at the robots.txt for "Google-Extended" to voluntarily decide if they can use said content for LLM training[1].

I assume Microsoft intends to do the same, given they have Bing and their recent stance on the matter[2].

[1] https://developers.google.com/search/docs/crawling-indexing/...

[2] https://www.businesstoday.in/technology/news/story/microsoft...

jsheard 2 years ago

Google does voluntarily allow robots.txt to be configured such that they will index pages but they promise not to use the content for training, but yeah if Google decided to go rogue then there wouldn't really be anything that site owners could do about it without killing their presence in Googles index.

jgrahamc 2 years ago

https://searchengineland.com/google-extended-crawler-432636