Comment by tombert

1 month ago

I have mixed opinions on the "AI=theft" argument people make, and I generally lean towards "it's not theft", but I do see the argument.

If I put something on Github with a GPL 3 license, it's supposed to require anyone with access to the binary to also have access to the source code. The concern is, if you think that it is theft, then someone can train an LLM on your GPL code, and then a for-profit corporation can use the code (or any clever algorithms you've come up with) and effectively "launder" your use of GPL code and make money in the process. It basically would be converting your code from Copyleft to Public Domain, which I think a lot of people would have an issue with.

4 comments

tombert

dangus 1 month ago

The thing is, LLMs aren’t redistributing your code. You’d have a minuscule chance of an LLM actually reproducing your code verbatim without major modifications.

Copyright and copyleft only deal with source code distribution. Your last sentence is not really true from a factual perspective.

I think if you really believe in the open source free software mentality that code should be available to help everyone and improvements to it should also be available and not locked up behind a corporate wall (e.g., a company using GPL code and releasing it with modifications without redistributing the source code), LLMs should be the least of your worries since they don’t do that action. On a literal level they don’t violate GPLv2/v3.

Perhaps copyright law needs new concepts to respond to this change in capability compared to the past, but so far there has been very little legal success with companies and individuals trying to litigate AI companies for copyright violations. Direct violations have been rare and only get more rare over time as training methods evolve.

tombert 1 month ago

Again, I tend fall more on the “it’s not theft” side of the debate.
That said, haven’t part of the complaints about Copilot and the like been specifically because they are reproducing large chunks of code verbatim?
catlifeonmars 1 month ago

> You’d have a minuscule chance of an LLM actually reproducing your code verbatim without major modifications.
Wait, are you kidding? This is literally a problem we have today with tools like Copilot.

baranul 1 month ago

The thing is, what you are describing is arguably theft. It is purposefully circumventing the license and restrictions chosen by the author, in order to steal their code, so that it can be sold and used for profit. This is along the same lines of why book authors and artists have been suing AI companies.

Another point, there is a lot of free and permissive license content to train AI on, where the GPL or copyright can be respected. In many cases, the violating AI companies knew what they were doing was wrong.