Comment by alphan0n
6 months ago
Right, but the onus of responsibility being on the end user publishing the song or creative work in violation of copyright, not the text editor, word processor, musical notation software, etc, correct?
A text prediction tool isn’t a person, the data it is trained on is irrelevant to the copyright infringement perpetrated by the end user. They should perform due diligence to prevent liability.
> A text prediction tool isn’t a person, the data it is trained on is irrelevant to the copyright infringement perpetrated by the end user. They should perform due diligence to prevent liability.
Huh what? If a program "predicts" some data that is a derivative work of some copyrighted work (that the end user did not input), then ipso facto the tool itself is a derivative work of that copyrighted work, and illegal to distribute without permission. (Does that mean it's also illegal to publish and redistribute the brain of a human who's memorised a copyrighted work? Probably. I don't have a problem with that). How can it possibly be the user's responsibility when the user has never seen the copyrighted work being infringed on, only the software maker has?
And if you say that OpenAI isn't distributing their program but just offering it as a service, then we're back to the original situation: in that case OpenAI is illegally distributing derivative works of copyrighted works without permission. It's not even a YouTube like situation where some user uploaded the copyrighted work and they're just distributing it; OpenAI added the pirated books themselves.
If the output of a mathematical model trained on an aggregate of knowledge that contains copyrighted material is derivative and infringing, then ipso facto, all works since the inception of copyright are derivative and infringing.
You learned English, math, social studies, science, business, engineering, humanities, from a McGraw Hill textbook? Sorry, all creative works you’ve produced are derivative of your educational materials copyrighted by the authors and publisher.
> If the output of a mathematical model trained on an aggregate of knowledge that contains copyrighted material is derivative and infringing, then ipso facto, all works since the inception of copyright are derivative and infringing.
I'm not saying every LLM output is necessarily infringing, I'm saying that some are, which means the underlying LLM (considered as a work on its own) must be. If you ask a human to come up with some copy for your magazine ad, they might produce something original, or they might produce something that rips off a copyrighted thing they read. That means that the human themselves must contain enough knowledge of the original to be infringing copyright, if the human was a product you could copy and distribute. It doesn't mean that everything the human produces infringes that copyright.
(Also, humans are capable of original thought of their own - after all, humans created those textbooks in the first place - so even if a human produces something that matches something that was in a textbook, they may have produced it independently. Whereas we know the LLM has read pirated copies of all the textbooks, so that defense is not available)
2 replies →
I do appreciate your point because it's one of the interesting side effects of AI to me. Revealing just how much we humans are a stack of inductive reasoning and not-actually-free-willed rehash of all that came before.
Of course, humans are also "trained" on their lived sensory experiences. Most people learn more about ballistics by playing catch than reading a textbook.
When it comes to copyright I don't think the point changes much. See the sibling comments which discuss constructive infringement and liability. Also, it's normal for us to have different rules for humans vs machines / corporations. And scale matters -- a single human just isn't capable of doing what the LLM can. Playing a record for your friends at home isn't a "performance", but playing it to a concert hall audience of thousands is.
1 reply →
Those software tools don't generate content the way an LLM does so they aren't particularly relevant.
It's more like if I hire a firm to write a book for me and they produce a derivative work. Both of us have a responsibility for guard against that.
Unfortunately there is no definitive way to tell if something is sufficiently transformative or not. It's going to come down to the subjective opinion of a court.
Copyright law is pretty clear on commissioned work, you are the holder, if your employee violated copyright and you failed to do your due diligence before publication, then you are responsible. If your employee violated copyright and fraudulently presented the work as original to you then you would seek compensation from them.
> Copyright law is pretty clear on commissioned work, you are the holder, if your employee violated copyright and you failed to do your due diligence before publication, then you are responsible.
No, for commissioned work in the usual sense the person you commissioned from is the copyright holder; you might have them transfer the copyright to you as part of your contract with them but it doesn't happen by default. It is in no way your responsibility to "do due diligence" on something you commissioned from someone, it is their responsibility to produce original work and/or appropriately license anything they based their work on. If your employee violates copyright in the course of working for you then you might be responsible for that, but that's for the same reason that you might be responsible for any other crimes your employee might commit in the course of working for you, not because you have some special copyright-specific responsibility.
6 replies →
How is the end user the one doing the infringement though? If I chat with ChatGPT and tell it „give me the first chapter of book XYZ“ and it gives me the text of the first chapter, OpenAI is distributing a copyrighted work without permission.
Can you do that though? Just ask ChatGPT to give you the first chapter of a book and it gives it to you?
https://news.ycombinator.com/item?id=42767775
Not a book chapter specifically but this could already be considered copyright infringement, I think.
1 reply →