Comment by froh

1 day ago

> The spirit of the GPL is the freedom of the user, not the code being freely shared.

who do you mean by "user"?

the spirit is that the person who actually uses the software also has the freedom to modify it, and that the users recovering these modifications have the same rights.

is that what you meant?

and while technically that's the spirit of the GPL, the license is not only about users, but about a _relationship_, that of the user and the software and what the user is allowed to do with the software.

it thus makes sense to talk about "software freedom".

last not least, about a single GPL function --- many GPL _libraries_ are licensed less restrictively, LGPL.

I don't think you understand the GPL.

> "the user is allowed to do with the software"

The GPL does not restrict what the user does with the software.

It can be USED for anything.

But it does restrict how you redistribute it. You have responsibilities if you redistribute it. You must provide the source code, and pass on the same freedoms you received to the users you redistribute it to.

  • Thinking on though, if the models are trained on any GPL code then one could consider that they contain that GPL code, and are constantly and continually updating and modifying that code, thus everything the model subsequently outputs and distributes should come under the GPL too. It’s far from sufficient that, say, OpenAI have a page on their website to redistribute the code they consume in their models if such code becomes part of the model’s training data that is resident in memory every time it produces new code for users. In the spirit of the GPL all that derivative code seems to also come under the GPL, and has to be made available for free, even if upon every request the generated code is somehow novel or unique to that user.

    • Riffing on this:

      If the LLM can reproduce the entire GPL'd code, with licence and attribution intact, then that would satisfy the GPL, correct?

      If the LLM can invent new code, inspired by but not copied from the GPL'd code, that new code does not require a GPL licence.

      This is essentially the same as we humans do: I read some GPL code and go "huh, neat architecture!" and then a year later solve a similar problem using an architecture inspired by that code. This is not copying, and does not require me to GPL the code I'm producing. But if I copy-paste a function from the GPL code into my code base, I need to respect the licence conditions and GPL at least part of my code base.

      I think the argument that the author is talking about is if the model itself should be GPL'd because it contains copies of GPL'd code that can be reproduced. I don't buy this because that GPL code is not being run as part of the model's functioning. To use an analogy: if I create a code storage system, and then use it to store some GPL code, I don't have to GPL the code storage system itself. As long as it can reproduce the GPL code together with its licence and attribution, then the GPL is not being infringed at any point. The system is not using or running the GPL code itself, it is just storing the GPL code. This is what the LLM is doing.

    • > Thinking on though, if the models are trained on any GPL code then one could consider that they contain that GPL code, and are constantly and continually updating and modifying that code, thus everything the model subsequently outputs and distributes should come under the GPL too.

      If you ask a model to output a task scheduler in C, and the training data contained a GPL-licensed implementation of the Fibonacci function in Haskell, the output isn't likely to bear a lot of resemblance to that input. It might even be unrelated enough that adding that function to the training data doesn't affect what the model outputs for that prompt at all.

      The nasty thing in terms using code generated by these things is that if you ask the model to output a task scheduler in C and the training data contained a GPL-licensed implementation of a task scheduler in C, the output plausibly could bear a strong resemblance to that input. Without you knowing that. And then if you go incorporate that into something you're redistributing, what happens?

      1 reply →

  • first I thought you'd go into the nuance of gpl2 vs 3 or lgpl vs gpl vs agpl? patents, tivoization, cloud use?

    :-)

    I agree, I didn't make any statement what you can do with the software as long as you are licensed to use it

    you are allowed to build atomic bombs, nuclear power plants, tanks, whatever.

    but only as long as you comply i.e. give your downstream the freedom you've received.

    if you fail at that, you're no longer allowed to use the software for anything.

    see section 8 Termination for details

    https://www.gnu.org/licenses/gpl-3.0.html#license-text