Comment by jxjnskkzxxhx
4 days ago
Do you have a reason to believe this ain't already being done? I would assume that the big guys like openai are already training on basically all text in existence.
4 days ago
Do you have a reason to believe this ain't already being done? I would assume that the big guys like openai are already training on basically all text in existence.
In fact, facebook torrented annas archive and got busted for it, because of course they did:
https://torrentfreak.com/meta-torrented-over-81-tb-of-data-t...
Every LLM maker probably did the same. Facebook just has disgruntled employees who leaked it
Google goes around legally scanning every book they can get their hands on with books.google.com. Legally scanning every paper they can get their hands on with scholar.google.com.
I doubt they'd resort to piracy for what is basically the same information as what they've already legally acquired...
3 replies →
Wasn't this confirmed what Meta does?
https://www.forbes.com/sites/danpontefract/2025/03/25/author...