Comment by csneeky
7 days ago
Is it really the case companies like OpenAI and Anthropic will repeatedly visit this archive and slurp it all up each time they train something? Wouldn’t that just be a one time thing (to get their own copy) with maybe the odd visit to get updates? My take is the article is about monetizing unique training info and I see them being paid maybe 10-20 times a year by folks building LLMs which is maybe nothing and maybe $$$$ I don’t know.
Not a doctor, but in Anthropic's case they bought actual books and scanned rather than using pirated versions. For digital versions from a vendor that were found to be in violation of the ToS they paid to settle the issue. https://www.npr.org/2025/09/05/nx-s1-5529404/anthropic-settl...