← Back to context

Comment by ninalanyon

9 months ago

What else could it be?

An original composition based on a statistical analysis of the training data. Statistical data about a copyrighted work obviously isn't necessarily a derivative of that work. Otherwise Tolkien could sue me for telling you how many times The Lord of the Rings uses the word "the".

  • Can it reproduce training data? Then its not analysis but compression, lossy compression.

    • For most LLMs, with most works, no.

      If you trained an LLM repeatedly on nothing but the text of LOTR until it could re-produce the books verbatim and then tried to sell copies of that LLM, then I agree that would be blatent copyright infringement, yes.