Comment by dangus
10 hours ago
I both agree and disagree with you.
The thing is, copyright law is not really on your side. Viewing copyrighted material without paying for it is not generally something people get fined for. A lot of training falls under fair use that overrides whatever license you come up with. Disney can’t stop me from uploading clips of their movies alongside commentary and review because fair use allows that. LLMs generally aren’t redistributing code, which is the thing that copyright protects.
If I inspect some GPL code and get inspired by it and write something similar, the GPL license doesn’t apply to me.
It has always been the case that if you don’t want other people to apply fair use to your works, your only recourse is to keep those works private. I suspect that now individuals and companies that don’t want their code to be trained on will simply keep the code private.
Now, there have been times where LLMs have reproduced verbatim copyright material. The NYTimes sued OpenAI over this issue. I believe they’ve settled and come up with a licensing scheme unless I’m mixing up my news stories.
Second thing, your issue becomes moot if there exists a model that only trains off of MIT-licensed code, and there is a TON of that code out there.
Third thing, your issue becomes moot if users have agreed to submit their code for training, like what the GitHub ToS does for users who don’t change their settings, or if giant companies with giant code bases just use their own code to train LLMs.
Where I agree with you is that perhaps copyright law should evolve. Still, I think there’s a practical “cat is out of the bag” issue.
No comments yet
Contribute on Hacker News ↗