Comment by TuringNYC
20 days ago
>> One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...
Are they not respecting robots.txt?
20 days ago
>> One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...
Are they not respecting robots.txt?
Quoting the top-level link to geraspora.de:
> Oh, and of course, they don’t just crawl a page once and then move on. Oh, no, they come back every 6 hours because lol why not. They also don’t give a single flying fuck about robots.txt, because why should they. And the best thing of all: they crawl the stupidest pages possible. Recently, both ChatGPT and Amazon were - at the same time - crawling the entire edit history of the wiki.
Edit history of a wiki sounds much more interesting than the current snapshot if you want to train a model.
Does that information improve or worsen the training?
Does it justify the resource demands?
Who pays for those resources and who benefits?