Comment by friendzis
1 day ago
> Genuine question: if I train my model with copyleft material, how do you prove I did?
An inverse of this question is arguably even more relevant: how do you prove that the output of your model is not copyrighted (or otherwise encumbered) material?
In other words, even if your model was trained strictly on copyleft material, but properly prompted outputs a copyrighted work is it copyright infringement and if so by whom?
Do not limit your thoughts to text only. "Draw me a cartoon picture of an anthropomorphic with round black ears, red shorts and yellow boots". Does it matter if the training set was all copyleft if the final output is indistinguishable from a copyrighted character?
> even if your model was trained strictly on copyleft material
That's not legal use of the material according to most copyleft licenses. Regardless if you end up trying to reproduce it. It's also quite immoral if technically-strictly-speaking-maybe-not-unlawful.
> That's not legal use of the material according to most copyleft licenses.
That probably doesn't matter given the current rulings that training an AI model on otherwise legally acquired material is "fair use", because the copyleft license inherently only has power because of copyright.
I'm sure at some point we'll see litigation over a case where someone attempts to make "not using the material to train AI" a term of the sales contract for something, but my guess would be that if that went anywhere it would be on the back of contract law, not copyright law.
Indeed, the GPL's definitions of "modify" and "propagate" restrict the license's scope to actions that would otherwise infringe on copyright if not permitted. And fair use and similar doctrines generally act as carve-outs to copyright infringement.
I have referenced words in the comment I was replying to, you can safely substitute "copyleft" with "public domain" and the argument still stands. Your comment focusing on minutiae of training, however, highlights how relevant the discussion around outputs in particular is.
edit: wording.