Comment by Xirdus
6 hours ago
People who break the social contract are the ones responsible for breaking the social contract, not the ones who take steps in response to social contract being broken.
6 hours ago
People who break the social contract are the ones responsible for breaking the social contract, not the ones who take steps in response to social contract being broken.
So the questions here are (a) is any generally accepted social contract actually being broken, and (b) if so, who are the ones who are breaking it?
The contract behind open source was something like (GPL):
"If you copy my work, you should share your work too."
or at minimum (MIT):
"If you copy my work, you should credit me."
I think it is no longer under dispute that the legal contract is satisfied by LLMs. The AI companies won and will continue to win.
But we are talking about a social contract, which is not quite the same thing. The social contract is what leads some devs who previously enjoyed publishing their work openly to no longer feel the same way. What did the authors mean by "copy"? Did they mean literally CTRL+C, CTRL+V or something broader?
This is a matter of opinion which only each individual creator can answer. For me, copying meant something like:
"To reproduce the function of my work, dependent on my having published it, without effort nor understanding of your own"
Ten years ago this basically required doing a CTRL+C, CTRL+V so there was no need to be more specific. Anybody who did enough work to, say, rewrite in another language (with that language's idioms), met the bar of clause 3. Now AI enables a form of "copying" that matches my definition, without the user even being aware of whose works they are copying. It perfectly launders the origins of its output. It can write an FFmpeg clone in Rust for you that would appear to be a novel work.
Of course, I cannot say that my own little bits and pieces of open source code would make a scratch in AI's capability, were it removed.
But I do strongly believe that if all the code that was published by authors with the same mindset was unavailable, Claude would be a far weaker developer.
> But we are talking about a social contract, which is not quite the same thing. The social contract is what leads some devs who previously enjoyed publishing their work openly to no longer feel the same way.
Perhaps this illustrates a fissure that was always lurking under the surface, then. The social contract that I've personally always attributed to FOSS communities was that attempting to restrict how people downstream of you use code is illegitimate, and that licenses like the GPL were meant to use copyright law to achieve something that resembles the state of affairs that might exist if copyright didn't exist in the first place. That's what the whole concept of "copyleft" always seemed to imply.
Now we have a new class of technologies that is admittedly fraught with a wide range of risks and pitfalls, but also a lot of promise to enable people to actually put the "four freedoms" into practice in ways they couldn't before, and we're seeing people who have normative opinions about AI derived from other, unrelated principles trying to circle the wagons and exclude those use cases. That is what seems like a breach of the social contract as I've always understood it.
> Did they mean literally CTRL+C, CTRL+V or something broader?
Given that FOSS licenses were always constructed to function within applicable copyright law, I don't see how they could mean anything else. "Literal CTRL+C, CTRL+V" is the only thing copyright has ever applied to, and the whole point of "copyleft" was to lessen the restrictions on even that.
8 replies →
Are you asking how AI coding agents, the companies selling them and the individuals using them break the FOSS social contract (copyleft, attribution, upstreaming), or are you disputing that they do?
Both would resolve to the same question, no?
There seems to be an implicit premise here that any work generated by an LLM whose training data includes a particular bit of code itself constitutes a redistribution of that code. I've yet to encounter any strong arguments substantiating this premise as a general principle, and my own suspicion is that it is not valid as a general principle, given the nature of how LLMs operate.
It's certainly possible that specific instances of LLMs lazily copy-pasting code from public repos may exist, and the extent to which this is happening is something that can be substantiated by empirical examples, so if you have any to point to, I'd be interested in looking at them. However, where this is happening, it ought to be regarded as a failure modality of LLMs, and not something that implicates the underlying nature of LLMs, given that their intended purpose is to function as stochastic generators that do not merely copy-paste input data.
My initial feeling here is that using open-source code to train LLMs is not per se a violation of the generally accepted FOSS social contract, but rather that attempting to restrict specific use cases of FOSS-licensed code on the basis of normative opinions unrelated to the license terms is a violation, or at least a rejection, of that social contract. I'm not fully committed to this position, though, and would welcome well-reasoned arguments to the contrary.
5 replies →
Yes, and obviously: bots crushing servers in strict contravention of the robots.txt rules.
“No, no, what was she wearing?”
People who take steps in response to social contract being broken are the ones responsible for the steps they've taken, not the ones who break the social contract.