← Back to context

Comment by stalluri

9 months ago

Models absorbed the pirated content. Now Meta is distributing those models. Is that considered distribution?

For that argument I believe the question becomes "is the output of a model considered a derivative work of the training data?"

https://www.copyright.gov/circs/circ14.pdf

I don't know what the legal answer will be, but I believe it should be considered distribution. A model is basically a highly lossy and extremely compressed copy of its training data, available as a content-addressable database. To anthropomorphize, the model is trying to perfectly replicate its training set, its brain just isn't big enough to do so.

Of course not.

I listened to other people's music and learned some of their songs before writing my own music, that doesn't mean my songs are distribution of theirs.

I read other people's books and short stores and news articles before writing my own, that doesn't mean my writing is distribution of theirs.

  • How about if I play your song at just the right speed with just the right EQ and I can get an exact reproduction of some of the songs you claim to have written? Because we can get large excerpts of exact copies of short and long form content as demonstrated clearly by the New York Times research on chatbots and their own content.