← Back to context

Comment by tombert

20 hours ago

I have mixed opinions on the "AI=theft" argument people make, and I generally lean towards "it's not theft", but I do see the argument.

If I put something on Github with a GPL 3 license, it's supposed to require anyone with access to the binary to also have access to the source code. The concern is, if you think that it is theft, then someone can train an LLM on your GPL code, and then a for-profit corporation can use the code (or any clever algorithms you've come up with) and effectively "launder" your use of GPL code and make money in the process. It basically would be converting your code from Copyleft to Public Domain, which I think a lot of people would have an issue with.

The thing is, LLMs aren’t redistributing your code. You’d have a minuscule chance of an LLM actually reproducing your code verbatim without major modifications.

Copyright and copyleft only deal with source code distribution. Your last sentence is not really true from a factual perspective.

I think if you really believe in the open source free software mentality that code should be available to help everyone and improvements to it should also be available and not locked up behind a corporate wall (e.g., a company using GPL code and releasing it with modifications without redistributing the source code), LLMs should be the least of your worries since they don’t do that action. On a literal level they don’t violate GPLv2/v3.

Perhaps copyright law needs new concepts to respond to this change in capability compared to the past, but so far there has been very little legal success with companies and individuals trying to litigate AI companies for copyright violations. Direct violations have been rare and only get more rare over time as training methods evolve.

  • Again, I tend fall more on the “it’s not theft” side of the debate.

    That said, haven’t part of the complaints about Copilot and the like been specifically because they are reproducing large chunks of code verbatim?

  • > You’d have a minuscule chance of an LLM actually reproducing your code verbatim without major modifications.

    Wait, are you kidding? This is literally a problem we have today with tools like Copilot.