Comment by alphan0n
6 months ago
If the output of a mathematical model trained on an aggregate of knowledge that contains copyrighted material is derivative and infringing, then ipso facto, all works since the inception of copyright are derivative and infringing.
You learned English, math, social studies, science, business, engineering, humanities, from a McGraw Hill textbook? Sorry, all creative works you’ve produced are derivative of your educational materials copyrighted by the authors and publisher.
> If the output of a mathematical model trained on an aggregate of knowledge that contains copyrighted material is derivative and infringing, then ipso facto, all works since the inception of copyright are derivative and infringing.
I'm not saying every LLM output is necessarily infringing, I'm saying that some are, which means the underlying LLM (considered as a work on its own) must be. If you ask a human to come up with some copy for your magazine ad, they might produce something original, or they might produce something that rips off a copyrighted thing they read. That means that the human themselves must contain enough knowledge of the original to be infringing copyright, if the human was a product you could copy and distribute. It doesn't mean that everything the human produces infringes that copyright.
(Also, humans are capable of original thought of their own - after all, humans created those textbooks in the first place - so even if a human produces something that matches something that was in a textbook, they may have produced it independently. Whereas we know the LLM has read pirated copies of all the textbooks, so that defense is not available)
You are saying that, any output is possibly infringing, dependandant on the input. This is actually, factually, verifiably, false, in terms of current copyright law.
No human, in the current epoch of education where copyright has been applicable, has learned, benefited, or exclusively created anything behreft of copyright. Please provide a proof otherwise if you truly believe so.
> You are saying that, any output is possibly infringing, dependandant on the input.
What? No. How did you get that from what I wrote? Please engage with the argument I'm actually making, not some imaginary different argument that you're making up.
> No human, in the current epoch of education where copyright has been applicable, has learned, benefited, or exclusively created anything behreft of copyright.
What are you even trying to claim here?
I do appreciate your point because it's one of the interesting side effects of AI to me. Revealing just how much we humans are a stack of inductive reasoning and not-actually-free-willed rehash of all that came before.
Of course, humans are also "trained" on their lived sensory experiences. Most people learn more about ballistics by playing catch than reading a textbook.
When it comes to copyright I don't think the point changes much. See the sibling comments which discuss constructive infringement and liability. Also, it's normal for us to have different rules for humans vs machines / corporations. And scale matters -- a single human just isn't capable of doing what the LLM can. Playing a record for your friends at home isn't a "performance", but playing it to a concert hall audience of thousands is.
My point isn’t adversarial, we most likely (in my most humble opinion) “learn” the same way as anything learns. That is to say, we are not unique in terms of understanding, “understandings”.
Are the ballistics we learn by physical interaction any different from the factual learning of ballistics that, for example, a squirrel learns, from their physical interactions?