Comment by stalluri
9 months ago
Models absorbed the pirated content. Now Meta is distributing those models. Is that considered distribution?
9 months ago
Models absorbed the pirated content. Now Meta is distributing those models. Is that considered distribution?
For that argument I believe the question becomes "is the output of a model considered a derivative work of the training data?"
https://www.copyright.gov/circs/circ14.pdf
What else could it be?
An original composition based on a statistical analysis of the training data. Statistical data about a copyrighted work obviously isn't necessarily a derivative of that work. Otherwise Tolkien could sue me for telling you how many times The Lord of the Rings uses the word "the".
2 replies →
The industry is banking on Author's Guild v. Google to be precedent in such a way that it's functionally transformative enough to be a completely new work.
https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....
I think they have about a coin flip of a chance that it passes muster in the courts.
I don't know what the legal answer will be, but I believe it should be considered distribution. A model is basically a highly lossy and extremely compressed copy of its training data, available as a content-addressable database. To anthropomorphize, the model is trying to perfectly replicate its training set, its brain just isn't big enough to do so.
It really should be.
Of course not.
I listened to other people's music and learned some of their songs before writing my own music, that doesn't mean my songs are distribution of theirs.
I read other people's books and short stores and news articles before writing my own, that doesn't mean my writing is distribution of theirs.
How about if I play your song at just the right speed with just the right EQ and I can get an exact reproduction of some of the songs you claim to have written? Because we can get large excerpts of exact copies of short and long form content as demonstrated clearly by the New York Times research on chatbots and their own content.