← Back to context

Comment by jxjnskkzxxhx

4 days ago

Do you have a reason to believe this ain't already being done? I would assume that the big guys like openai are already training on basically all text in existence.

In fact, facebook torrented annas archive and got busted for it, because of course they did:

https://torrentfreak.com/meta-torrented-over-81-tb-of-data-t...

  • Every LLM maker probably did the same. Facebook just has disgruntled employees who leaked it

    • Google goes around legally scanning every book they can get their hands on with books.google.com. Legally scanning every paper they can get their hands on with scholar.google.com.

      I doubt they'd resort to piracy for what is basically the same information as what they've already legally acquired...

      3 replies →