Comment by stalluri

1 year ago

Models absorbed the pirated content. Now Meta is distributing those models. Is that considered distribution?

12 comments

stalluri

For that argument I believe the question becomes "is the output of a model considered a derivative work of the training data?"

https://www.copyright.gov/circs/circ14.pdf

ninalanyon 1 year ago
What else could it be?
- Ajedi32 1 year ago
  
  An original composition based on a statistical analysis of the training data. Statistical data about a copyrighted work obviously isn't necessarily a derivative of that work. Otherwise Tolkien could sue me for telling you how many times The Lord of the Rings uses the word "the".
  
  2 replies →
- monocasa 1 year ago
  
  The industry is banking on Author's Guild v. Google to be precedent in such a way that it's functionally transformative enough to be a completely new work.
  https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....
  I think they have about a coin flip of a chance that it passes muster in the courts.

aezart 1 year ago

I don't know what the legal answer will be, but I believe it should be considered distribution. A model is basically a highly lossy and extremely compressed copy of its training data, available as a content-addressable database. To anthropomorphize, the model is trying to perfectly replicate its training set, its brain just isn't big enough to do so.

WXLCKNO 1 year ago

It really should be.

bodiekane 1 year ago

Of course not.

I listened to other people's music and learned some of their songs before writing my own music, that doesn't mean my songs are distribution of theirs.

I read other people's books and short stores and news articles before writing my own, that doesn't mean my writing is distribution of theirs.

asadotzler 1 year ago

How about if I play your song at just the right speed with just the right EQ and I can get an exact reproduction of some of the songs you claim to have written? Because we can get large excerpts of exact copies of short and long form content as demonstrated clearly by the New York Times research on chatbots and their own content.