Comment by __MatrixMan__
1 day ago
I don't understand this mindset. I solve problems on stackoverflow and github because I want those problems to stay solved. If the fixes are more convenient for people to access as weights in an LLM... who cares?
I'd be all for forcing these companies to open source their models. I'm game to hear other proposals. But "just stop contributing to the commons" strikes me as a very negative result here.
We desperately need better legal abstractions for data-about-me and data-I-created so that we can stop using my-data as a one-size-fits-all square peg. Property is just out of place here.
I have mixed opinions on the "AI=theft" argument people make, and I generally lean towards "it's not theft", but I do see the argument.
If I put something on Github with a GPL 3 license, it's supposed to require anyone with access to the binary to also have access to the source code. The concern is, if you think that it is theft, then someone can train an LLM on your GPL code, and then a for-profit corporation can use the code (or any clever algorithms you've come up with) and effectively "launder" your use of GPL code and make money in the process. It basically would be converting your code from Copyleft to Public Domain, which I think a lot of people would have an issue with.
The thing is, LLMs aren’t redistributing your code. You’d have a minuscule chance of an LLM actually reproducing your code verbatim without major modifications.
Copyright and copyleft only deal with source code distribution. Your last sentence is not really true from a factual perspective.
I think if you really believe in the open source free software mentality that code should be available to help everyone and improvements to it should also be available and not locked up behind a corporate wall (e.g., a company using GPL code and releasing it with modifications without redistributing the source code), LLMs should be the least of your worries since they don’t do that action. On a literal level they don’t violate GPLv2/v3.
Perhaps copyright law needs new concepts to respond to this change in capability compared to the past, but so far there has been very little legal success with companies and individuals trying to litigate AI companies for copyright violations. Direct violations have been rare and only get more rare over time as training methods evolve.
Again, I tend fall more on the “it’s not theft” side of the debate.
That said, haven’t part of the complaints about Copilot and the like been specifically because they are reproducing large chunks of code verbatim?
> You’d have a minuscule chance of an LLM actually reproducing your code verbatim without major modifications.
Wait, are you kidding? This is literally a problem we have today with tools like Copilot.
I find it very easy to understand, people don’t generally want to work for free to support billionaires, and they have few venues to act on that, this is one of them.
There are no ”commons” in this scenario, there are a few frontier labs owning everything (taking it without attribution) and they have the capability to take it away, or increase prices to a point where it becomes a tool for the rich.
Nobody is doing this for the good of anything, it’s a money grab.
Were these contributions not a radical act against zero-sum games in the first place? And now you're gonna let the zero-sum people win by restricting your own outputs to similarly zero-sum endeavors?
I don't wanna look a gift horse in the mouth here. I'm happy to have benefited from whatever contributions were originally forthcoming and I wouldn't begrudge anybody for no longer going above and beyond and instead reverting to normal behavior.
I just don't get it, it's like you're opposed to people building walls, but you see a particularly large wall which makes you mad, so your response is to go build a wall yourself.
It's not about building a wall. It's about ensuring that the terms of the license chosen by the author are respected.
This is why I think permissive licenses are a mistake for most projects. Unlike copyleft licenses, they allow users to take away the freedoms they enjoy from users of derivative works. It's no surprise that dishonest actors take advantage of this for their own gain. This is the paradox of tolerance.
"AI" companies take this a step further, and completely disregard the original license. Whereas copyleft would somewhat be a deterrent for potential abusers, it's not for this new wave of companies. They can hide behind the already loosely defined legal frameworks, and claim that the data is derivative enough, or impossible to trace back, or what have you. It's dishonest at best, and corrupts the last remnants of public good will we still enjoy on the internet.
We need new legal frameworks for this technology, but since that is a glacial process, companies can get rich in the meantime. Especially shovel salespeople.
[dead]