Comment by johnisgood

11 hours ago

> Right now it's very easy not to infringe on copyrighted code if you write the code yourself.

Humans routinely produce code similar to or identical to existing copyrighted code without direct copying.

They don’t produce enough similar code to infringe frequently. And if they did independent creation is an affirmative defense to copyright infringement that likely doesn’t apply to LLMs since they have the demonstrated capability to produce code directly from their training set.

  • You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.

    On independent creation: you are conflating the tool with the user. The defense applies to whether the developer had access to the copyrighted work, not whether their tools did. A developer using an LLM did not access the training set directly, they used a synthesis tool. By your logic, any developer who has read GPL code on GitHub should lose independent creation defense because they have "demonstrated capability to produce code directly from" their memory.

    LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case). Training set contamination happens, but it is rare and considered a bug. Humans also occasionally reproduce code from memory: we do not deny them independent creation defense wholesale because of that capability!

    In any case, the legal question is not settled, but the argument that LLM-assisted code categorically cannot qualify for independent creation defense creates a double standard that human-written code does not face.

And that's not an infringement. Actual copying is the infringement, not having the same code. The most likely way to have the same code is by copying, but it's not the only way.