AI assistance when contributing to the Linux kernel

6 hours ago (github.com)

115 comments

hmokiguess

Basically the rules are that you can use AI, but you take full responsibility for your commits and code must satisfy the license.

That's... refreshingly normal? Surely something most people acting in good faith can get behind.

galaxyLogic 4 hours ago
But then if AI output is not under GNU General Public License, how can it become so just because a Linux-developer adds it to the code-base?
- jillesvangurp 3 hours ago
  
  AIs are not human and therefore their output is a human authored contribution and only human authored things are covered by copyright. The work might hypothetically infringe on other people's copyright. But such an infringement does not happen until a human decides to create and distribute a work that somehow integrates that generated code or text.
  The solution documented here seems very pragmatic. You as a contributor simply state that you are making the contribution and that you are not infringing on other people's work with that contribution under the GPLv2. And you document the fact that you used AI for transparency reasons.
  There is a lot of legal murkiness around how training data is handled, and the output of the models. Or even the models themselves. Is something that in no way or shape resembles a copyrighted work (i.e. a model) actually distributing that work? The legal arguments here will probably take a long time to settle but it seems the fair use concept offers a way out here. You might create potentially infringing work with a model that may or may not be covered by fair use. But that would be your decision.
  For small contributions to the Linux kernel it would be hard to argue that a passing resemblance of say a for loop in the contribution to some for loop in somebody else's code base would be anything else than coincidence or fair use.
  
  7 replies →
- afro88 4 hours ago
  
  Same as if a regular person did the same. They are responsible for it. If you're using AI, check the code doesn't violate licenses
  
  23 replies →
- noosphr 3 hours ago
  
  Tab complete does not produce copyrightable material either. Yet we don't require software to be written in nano.
- panzi 4 hours ago
  
  If the output is public domain it's fine as I understand it.
  
  10 replies →
shevy-java 4 hours ago
But why should AI then be attributed if it is merely a tool that is used?
- lonelyasacloud 1 hour ago
  
  Having an honesty based tag could be only way to monitor impact or get after a fix in code bases if things go south.
  That is at the moment: - Nobody knows for sure what agents might add and their long term effects on codebases.
  - It's at best unclear that AI content in a codebase can be reliably determined automatically.
  - Even if it's not malicious, at least some of its contributions are likely to be deleterious and pass undetected by human review.
- plmpsu 3 hours ago
  
  it makes sense to keep track of what model wrote what code to look for patterns, behaviors, etc.
- streetfighter64 3 hours ago
  
  It isn't?
  > AI agents MUST NOT add Signed-off-by tags. Only humans can legally certify the Developer Certificate of Origin (DCO).
  They mention an Assisted-by tag, but that also contains stuff like "clang-tidy". Surely you're not interpreting that as people "attributing" the work to the linter?

ninjagoo 3 hours ago

  > Signed-Off ...
  > The human submitter is responsible for:
    > Reviewing all AI-generated code
    > Ensuring compliance with licensing requirements
    > Adding their own Signed-off-by tag to certify the DCO
    > Taking full responsibility for the contribution

  > Attribution: ... Contributions should include an Assisted-by tag in the following format:

Responsibility assigned to where it should lie. Expected no less from Torvalds, the progenitor of Linux and Git. No demagoguery, no b*.

I am sure that this was reviewed by attorneys before being published as policy, because of the copyright implications.

Hopefully this will set the trend and provide definitive guidance for a number of Devs that were not only seeing the utility behind ai assistance but also the acrimony from some quarters, causing some fence-sitting.

ipython 4 hours ago

Glad to see the common-sense rule that only humans can be held accountable for code generated by AI agents.

pixel_popping 4 hours ago
Literally, insane that some projects blanket-ban AI despite being the human responsibility in the end.
- tom_ 1 hour ago
  
  It is no more insane than doing the opposite. This whole business has yet to play itself out.
- KoftaBob 1 hour ago
  
  It's just a form of sanctimonious virtue-signaling that's trendy right now.
- pydry 4 hours ago
  
  And yet it puts a stop to the tsunami of slop and it's pretty much impossible to prove anything of value was lost.
  
  5 replies →
- daveguy 4 hours ago
  
  Not insane at all. Just a very useful shortcut. Not everyone wants to move fast and break shit.
  
  6 replies →

sarchertech 4 hours ago

This does nothing to shield Linux from responsibility for infringing code.

This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.

It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.

SirHumphrey 4 hours ago
Quite a lot of companies use and release AI written code, are they all liable?
- sarchertech 4 hours ago
  
  1. Almost definitely if discovered
  2. Infringement in closed source code isn’t as likely to be discovered
  3. OpenAI and Anthropic enterprise agreements agree to indemnify (pay for damages essentially) companies for copyright issues.
- nitwit005 2 hours ago
  
  Yep, and honestly it's going to come up with things other than lawsuits.
  I've worked at a company that was asked as part of a merger to scan for code copied from open source. That ended up being a major issue for the merger. People had copied various C headers around in odd places, and indeed stolen an odd bit of telnet code. We had to go clean it up.

dataviz1000 4 hours ago

This is discussed in the Linus vs Linus interview, "Building the PERFECT Linux PC with Linus Torvalds". [0]

[0] https://youtu.be/mfv0V1SxbNA?si=CBnnesr4nCJLuB9D&t=2003

dec0dedab0de 4 hours ago

All code must be compatible with GPL-2.0-only

Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?

compyman 4 hours ago

You might be being too pedantic :)
https://spdx.org/licenses/GPL-2.0-only.html It's a specific GPL license (as opposed to GPL 2.0-later)
philipov 4 hours ago
GPL-2.0-only is the name of a license. One word. It is an alternative to GPL-2.0-or-later.
- kbelder 1 hour ago
  
  Right, the final hyphen changes the meaning of the sentence.
  "GPL-2.0-only" "GPL-2.0 only"

KhayaliY 2 hours ago

We've seen in the past, for instance in the world of compliance, that if companies/governments want something done or make a mistake, they just have a designated person act as scapegoat.

So what's preventing lawyers/companies having a batch of people they use as scapegoats, should something go wrong?

newsoftheday 4 hours ago

> All code must be compatible with GPL-2.0-only

How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.

philipov 4 hours ago
You take responsibility. That means if the AI messes up, you get punished. No pushing blame onto the stupid computer. If you're not comfortable with that, don't use the AI.
- sarchertech 4 hours ago
  
  There’s no reasonable way for you to use AI generated code and guarantee it doesn’t infringe.
  The whole use it but if it behaves as expected, it’s your fault is a ridiculous stance.
  
  7 replies →
- newsoftheday 4 hours ago
  
  > That means if the AI messes up
  I'm not talking about maintainability or reliability. I'm talking about legal culpability.
tmp10423288442 4 hours ago

Wait for court cases I suppose - not really Linus Torvalds' job to guess how they'll rule on the copyright of mere training. Presumably having your AI actually consult codebases with incompatible licenses at runtime is more risky.

themafia 2 hours ago

> All contributions must comply with the kernel's licensing requirements:

I just don't think that's realistically achievable. Unless the models themselves can introspect on the code and detect any potential license violations.

If you get hit with a copyright violation in this scheme I'd be afraid that they're going to hammer you for negligence of this obvious issue.

baggy_trough 4 hours ago

Sounds sensible.

lowsong 3 hours ago

At least it'll make it easy to audit and replace it all in a few years.

NetOpWibby 2 hours ago

inb4 people rage against Linux

spwa4 3 hours ago

Why does this file have an extension of .rst? What does that even mean for the fileformat?

jdreaver 3 hours ago

https://en.wikipedia.org/wiki/ReStructuredText
This format really took off in the Python community in the 2000's for documentation. The Linux kernel has used it for documentation as well for a while now.
adikso 3 hours ago

reStructuredText. Just like you have .md files everywhere.

techpulselab 5 minutes ago

[dead]

martin-t 4 hours ago

This feels like the OSS community is giving up.

LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].

The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which

1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies

2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.

I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.

Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.

[0]: https://skeptics.stackexchange.com/questions/14925/has-a-pro...

ninjagoo 1 hour ago
> Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.
I think you'll find that this is not settled in the courts, depending on how the data was obtained. If the data was obtained legally, say a purchased book, courts have been finding that using it for training is fair use (Bartz v. Anthropic, Kadrey v. Meta).
Morally the case gets interesting.
Historically, there was no such thing as copyright. The English 1710 Statute of Anne establishing copyright as a public law was titled 'for the Encouragement of Learning' and the US Constitution said 'Congress may secure exclusive rights to promote the progress of science and useful arts'; so essentially public benefits driven by the grant of private benefits.
The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?
The more the people that copy your work with attribution, the more famous you'll be. Now that's the currency of the future*. [1]
You'll do it for the kudos. [2][3]
*Post-Scarcity Future. [1] https://en.wikipedia.org/wiki/Post-scarcity [2] https://en.wikipedia.org/wiki/The_Quiet_War, et. al. [3] https://en.wikipedia.org/wiki/Accelerando
- martin-t 36 minutes ago
  
  > The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?
  Yes.
  I have 2 issues with "post-scarcity":
  - It often implicitly assumes humanity is one homogeneous group where this state applies to everyone. In reality, if post-scarcity is possible, some people will be lucky enough to have the means to live that lifestyle while others will still by dying of hunger, exposure and preventable diseases. All else being equal, I'd prefer being in the first group and my chance for that is being economically relevant.
  - It often ignores that some people are OK with having enough while others have a need to have more than others, no matter how much they already have. The second group is the largest cause of exploitation and suffering in the world. And the second group will continue existing in a post-scarcity world and will work hard to make scarcity a real thing again.
  ---
  Back to your question:
  I made the mistake of publishing most of my public code under GPL or AGPL. I regret is because even though my work has brought many people some joy and a bit of my work was perhaps even useful, it has also been used by people who actively enjoy hurting others, who have caused measurable harm and who will continue causing harm as long as they're able to - in a small part enabled by my code.
  Permissive licenses are socially agnostic - you can use the work and build on top of it no matter who you are and for what purpose.
  A(GPL) is weakly pro-social - you can use the work no matter what but you can only build on top of it if you give back - this produces some small but non-zero social pressure (enforced by violence through governments) which benefits those who prefer cooperation instead of competition.
  What I want is a strongly pro-social license - you can use or build on top of my work only if you fulfill criteria I specify such as being a net social good, not having committed any serious offenses, not taking actions to restrict other people's rights without a valid reason, etc.
  There have been attempts in this direction[0] but not very successful.
  In a world without LLMs, I'd be writing code using such a license but more clearly specified, even if I had to write my own. Yes, a layer would do a better job, that does not mean anything written by a non-lawyer is completely unenforceable.
  With LLMs, I have stopped writing public code at all because the way I see it, it just makes people much richer than me even richer at a much faster rate than I can ever achieve myself. Ir just makes inequality worse. And with inequality, exploitation and oppression tends to soon follow.
  [0]: https://json.org/license.html
KK7NIL 4 hours ago
> I strongly object to anthropomorphising text transformers (e.g. "Assisted-by").
I don't think this is anthropomorphising, especially considering they also include non-LLM tools in that "Assisted-by" section.
We're well past the Turing test now, whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming.
- martin-t 3 hours ago
  
  Would you say "assisted by vim" or "assisted by gcc"?
  It should be either something like "(partially/completely) generated by" or if you want to include deterministic tools, then "Tools-used:".
  The Turing test is an interesting thought experiment but we've seen it's easy for LLMs to sound human-like or make authoritative and convincing statements despite being completely wrong or full of nonsense. The Turing test is not a measure of intelligence, at least not an artificial one. (Though I find it quite amusing to think that the point at which a person chooses to refer to LLMs as intelligence is somewhat indicative of his own intelligence level.)
  > whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming
  It absolutely makes a difference: you can't own a human but you can own an LLM (or a corporation which is IMO equally wrong as owning a human).
  Humans have needs which must be continually satisfied to remain alive. Humans also have a moral value (a positive one - at least for most of us) which dictates that being rendered unable to remain alive is wrong.
  Now, what happens if LLMs have the same legal standing as humans and are thus able to participate in the economy in the same manner?
  
  2 replies →
tmp10423288442 4 hours ago
On https://news.ycombinator.com/item?id=47356000, it looks like the user there was intentionally asking about the implementation of the Python chardet library before asking it to write code, right? Not surprising the AI would download the library to investigate it by default, or look for any installed copies of `chardet` on the local machine.
- martin-t 4 hours ago
  
  The comment says "Opus 4.6 without tool use or web access"

redoh 3 hours ago

[dead]

northstar-au 2 hours ago

[dead]

bitwize 5 hours ago

Good. The BSDs should follow suit. It is unreasonable to expect any developer not to use AI in 2026.

shevy-java 4 hours ago

Fork the kernel!

Humans for humans!

Don't let skynet win!!!

aruametello 3 hours ago

> Fork the kernel!
pre "clanker-linux".
I am more intrigued by the inevitable Linux distro that will refuse any code that has AI contributions in it.

the_biot 3 hours ago

Linux has fallen. Linus Torvalds is now just another vibe coder. I give it less than a year, or maybe a month, until Linux gets vibe-coded patches approved by LLMs.

Open source is dead, having had its code stolen for use by vibe-coding idiots.

Make no mistake, this is the end of an era.

_blaise_ 2 hours ago
Linus is the original vibe coder. He barks orders at cadre of human contributor agents and subsystem maintainer agents until the code looks the way he likes.
- ninjagoo 1 hour ago
  
  > Linus is the original vibe coder.
  LoL. Jesting aside, could this be considered slander against the author of Git (Torvalds, in case you didn't know)? ;-)
  OpenHub lists Linus Torvalds as having made 46,338 commits. 45,178 for Linux, 1,118 for Git. His most recent commit was 17 days ago. [1]
  That is a far cry from a vibe-coder, no? :-)
  [1] https://openhub.net/accounts/9897