Comment by iterateoften
1 day ago
I stopped writing open source projects on github because why put a bunch of work into something for others to train off of without any regard for the original projects
1 day ago
I stopped writing open source projects on github because why put a bunch of work into something for others to train off of without any regard for the original projects
I don't understand this mindset. I solve problems on stackoverflow and github because I want those problems to stay solved. If the fixes are more convenient for people to access as weights in an LLM... who cares?
I'd be all for forcing these companies to open source their models. I'm game to hear other proposals. But "just stop contributing to the commons" strikes me as a very negative result here.
We desperately need better legal abstractions for data-about-me and data-I-created so that we can stop using my-data as a one-size-fits-all square peg. Property is just out of place here.
I have mixed opinions on the "AI=theft" argument people make, and I generally lean towards "it's not theft", but I do see the argument.
If I put something on Github with a GPL 3 license, it's supposed to require anyone with access to the binary to also have access to the source code. The concern is, if you think that it is theft, then someone can train an LLM on your GPL code, and then a for-profit corporation can use the code (or any clever algorithms you've come up with) and effectively "launder" your use of GPL code and make money in the process. It basically would be converting your code from Copyleft to Public Domain, which I think a lot of people would have an issue with.
The thing is, LLMs aren’t redistributing your code. You’d have a minuscule chance of an LLM actually reproducing your code verbatim without major modifications.
Copyright and copyleft only deal with source code distribution. Your last sentence is not really true from a factual perspective.
I think if you really believe in the open source free software mentality that code should be available to help everyone and improvements to it should also be available and not locked up behind a corporate wall (e.g., a company using GPL code and releasing it with modifications without redistributing the source code), LLMs should be the least of your worries since they don’t do that action. On a literal level they don’t violate GPLv2/v3.
Perhaps copyright law needs new concepts to respond to this change in capability compared to the past, but so far there has been very little legal success with companies and individuals trying to litigate AI companies for copyright violations. Direct violations have been rare and only get more rare over time as training methods evolve.
2 replies →
I find it very easy to understand, people don’t generally want to work for free to support billionaires, and they have few venues to act on that, this is one of them.
There are no ”commons” in this scenario, there are a few frontier labs owning everything (taking it without attribution) and they have the capability to take it away, or increase prices to a point where it becomes a tool for the rich.
Nobody is doing this for the good of anything, it’s a money grab.
Were these contributions not a radical act against zero-sum games in the first place? And now you're gonna let the zero-sum people win by restricting your own outputs to similarly zero-sum endeavors?
I don't wanna look a gift horse in the mouth here. I'm happy to have benefited from whatever contributions were originally forthcoming and I wouldn't begrudge anybody for no longer going above and beyond and instead reverting to normal behavior.
I just don't get it, it's like you're opposed to people building walls, but you see a particularly large wall which makes you mad, so your response is to go build a wall yourself.
1 reply →
[dead]
I'm writing a few DSLs a year at this point and I would very much like them to be part of the training data for LLMs!
https://www.softwareheritage.org/ will index it anyway.
Also, if you publish your code in your own server, it will be DDoSed to death by the many robots that will try to scrape it simultaneously.
that's why i don't add comments to my commits, i don't want them to know the reason for the changes.
Good, we don’t want code that people are possessive of, in the software commons. The attitude that you are concerned about what people do with your output means that nobody should touch your output, too big a risk of drama.
We don’t own anything we release to the world.
"Good riddance" is a pretty lousy position to take re: volunteer work. It should be: "how can we fix this?"