← Back to context

Comment by Vegenoid

13 hours ago

> Let's say I look (as a human) at some GPL source code. And then I close the browser tab and roughly re-implement from memory what I saw. Am I now required to release my own code as GPL? More extrtsembles what I saw back then, then I can, in your universe, be sued because only "libre programmers" may read "libre source code".

It's entirely dependent on how similar the code you write is to the licensed code that you saw, and what could be proved about what you saw, but potentially yes: if you study GPL code, and then write code that is very uniquely similar to it, you may have infringed on the author's copyright. US courts have made some rulings which say that the substantial similarity standard does apply to software, although pretty much every ruling for these cases ends up in the defendant's favor (the one who allegedly "copied" some software).

> So, for LLMs, even if the input is GPL, proprietary, whatever: if the output is unrecognizable from the input, it does not matter.

Sure, but that doesn't apply to this instance. This is implementing a BSD driver based on a Linux driver for that hardware. I'm not making the general case that LLMs are committing copyright infringement on a grand scale. I'm saying that giving GPL code to an LLM (in this case the GPL code was input to the model, which seems much more egregious than it being in the training data) and having the LLM generate that code ported to a new platform feels slimy. If we can do this, then copyleft licenses will become pretty much meaningless. I gather some people would consider that a win.