Comment by nickff
19 hours ago
I have not seen any evidence that LLMs ‘distribute’ modified software, though they do seem capable of replicating it.
19 hours ago
I have not seen any evidence that LLMs ‘distribute’ modified software, though they do seem capable of replicating it.
I fail to see how mass scale reproduction of copyrighted code isn't a form of distribution.
Replication is not the same as reproduction; I can replicate an API without violating someone's license or copyright (which I would by reproducing their work).
Reproduce is a definition of replicate. And LLMs reproduced code.
The view LLMs should respect open source software licenses is not for replication alone. Models and generated code are derived from training data.
Developers are permitted to learn from open source code with restrictive copyrights, and apply those lessons to developing other software which does not comply with the copyright of their 'example'.
As an aside, I do believe that LLM trainers are ignoring and violating many licenses, but open-source software is not a clear example of a violation.
LLMs are not people. They do not learn the way people do.
Even if they did, if someone memorized copywritten code and then typed it back out that would still be a copywrite violation
Depends on how you define "learn": usually, a company wanting to rebuild and publish something under a different license prohibits their developers from having ever looked at original code, to avoid the risk of copying over exact snippets out of their memory accidentally.
Copyright protects only arbitrarily non-trivial parts of the original being reproduced, but that means that you have to be careful with learning from copyrighted material. Programming books will have direct clauses allowing snippet reuse, but not for teaching purposes.
> Sure, but developers are permitted to learn from open source code with restrictive copyrights, and apply those lessons to developing other software which does not comply with the copyright of their 'example'.
This was a different argument. And there is no contradiction to separate LLMs and people.
> As an aside, I do believe that LLM trainers are ignoring and violating many licenses, but open-source software is not a clear example of a violation.
How?