Comment by marcus_holmes

19 hours ago

Riffing on this:

If the LLM can reproduce the entire GPL'd code, with licence and attribution intact, then that would satisfy the GPL, correct?

If the LLM can invent new code, inspired by but not copied from the GPL'd code, that new code does not require a GPL licence.

This is essentially the same as we humans do: I read some GPL code and go "huh, neat architecture!" and then a year later solve a similar problem using an architecture inspired by that code. This is not copying, and does not require me to GPL the code I'm producing. But if I copy-paste a function from the GPL code into my code base, I need to respect the licence conditions and GPL at least part of my code base.

I think the argument that the author is talking about is if the model itself should be GPL'd because it contains copies of GPL'd code that can be reproduced. I don't buy this because that GPL code is not being run as part of the model's functioning. To use an analogy: if I create a code storage system, and then use it to store some GPL code, I don't have to GPL the code storage system itself. As long as it can reproduce the GPL code together with its licence and attribution, then the GPL is not being infringed at any point. The system is not using or running the GPL code itself, it is just storing the GPL code. This is what the LLM is doing.