Comment by Myrmornis
3 months ago
But lyrics are just one example. Are you saying that training experiments must filter out all substrings from the training input that bear too close a resemblance to a substring of a copyrighted work?
3 months ago
But lyrics are just one example. Are you saying that training experiments must filter out all substrings from the training input that bear too close a resemblance to a substring of a copyrighted work?
Obviously there's a limit, reproducing a single sentence is unlikely to be copyright infringement just because there are only so many words in a language; but if reproducing some text would be copyright infringement if a human did it, I don't see why LLM companies should get a free pass.
If it's really essential that they train their models on song lyrics, or books, or movie scripts, or articles, or whatever, they should pay license fees.
At some point, use of the lyrics becomes de minimis