Comment by jxjnskkzxxhx

9 months ago

Do you have a reason to believe this ain't already being done? I would assume that the big guys like openai are already training on basically all text in existence.

7 comments

jxjnskkzxxhx

IlikeKitties 9 months ago

In fact, facebook torrented annas archive and got busted for it, because of course they did:

https://torrentfreak.com/meta-torrented-over-81-tb-of-data-t...

HDThoreaun 9 months ago
Every LLM maker probably did the same. Facebook just has disgruntled employees who leaked it
- gpm 9 months ago
  
  Google goes around legally scanning every book they can get their hands on with books.google.com. Legally scanning every paper they can get their hands on with scholar.google.com.
  I doubt they'd resort to piracy for what is basically the same information as what they've already legally acquired...
  
  3 replies →

ar_lan 9 months ago

Wasn't this confirmed what Meta does?

https://www.forbes.com/sites/danpontefract/2025/03/25/author...