Comment by Arcuru
20 hours ago
Personally, I think that the human directing the agent owns the copyright for whatever is produced, but the ability for the agent to build it in the first place is based off of stolen IP.
I'm concerned about the copyright 'washing' this enables though, especially in OSS, and I think the right thing for OSS devs to do is to try to publish resulting code with the strongest copyleft licensing that they are comfortable with - https://jackson.dev/post/moral-ai-licensing/
Funny how the copyright industry was able to spin copyright infringment into the pejorative "stealing". If you still have the item, what was stolen?
Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act
I still find the idea that "learning" from code is "stealing" kind of ridiculous.
The "learning" isn't learning really. I mean it might be, but if you define learning to be a human endeavor than AI can't learn.
It's perfectly reasonable to say it's okay for humans to do something but not okay for a computer program to do the same thing. We don't have to equate AI to humans, that's a choice and usually a bad one.
7 replies →
Yes I guess there's also no such thing as stealing in torrents since the computer "learns" the data and returns it in a transcoded fashion so it's technically not a reproduction. Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".
The mental calisthenics required to justify this stuff must be exhausting.
7 replies →
I think that it's absurd that we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.
I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".
3 replies →
I find it more ridiculous to equate the act of a human learning with for-profit AI training without recompense to the authors of the training material.
If there were the case, then imagine having to give it back!
Learning, probably not.
Copy/pasting at scale, yes
12 replies →
If you can set a copyright trap and an LLM reproduces it I think it's pretty clear cut that it's more than just "learning".
I have seen LLMs do all sorts of crap which was clearly reproduction of training material.
This is also why people are most impressed with how much better it is at reproducing boilerplate rather than, say, imaginative new ideas.
1 reply →
If I “learned” your essay and handed it in, would you be happy with that?
Everybody has had a complete 180 in terms of copyright protections. Before, nobody cared about downloading music, movies, TV shows, or pirating games. Now, when the copyright law is affecting them, they are gungho about protecting these billion-dollar companies' copyrights.
A more logical explanation would be that there are different opinions and those who complain are usually louder.
2 replies →
Its not about "billion-dollar companies' copyrights", but also about voluntary copyleft free software. If I license my code under GPL I don't want other persons/companies just whitewash that code through LLMs and use it in their proprietary code.
1 reply →
I don't think it's unreasonable to consider it stolen potential profit, but agreed that's not how they spin it
“Stolen” as in “profited on IP against terms and conditions of the license”.
[dead]
[dead]
Copyright isn't some natural state of being though, it's something that's granted to people by the government to "promote the progress of science and useful arts". If copyright hinders things then I think it's reasonable that exceptions would be made.
Copyright laundering is an illusion.
If the LLM generates output that a court decides is sufficiently derivative, and especially (but not necessarily) if the LLM was trained on the source material being infringed, then whoever redistributes the derivative output is going to be liable for copyright infringement.
Creation of the LLM itself is transformative, but LLM output which infringes is not.
Is it true then that if someone stole an entire code base from a vibe coded app from a non permissively licensed project and that person claimed that it was derived from an LLM and was not stolen at all that the person who stole the code is not a thief because it came from the same place? Or are they a thief because someone else copyrighted it? How do vibe coders protect themselves not knowing who else has the same derivative code or who holds the copyright first? Or can't they?
The only thing a vibe coder should be able to copyright, is the prompt text they wrote. Not the output of the LLM, only the text they wrote to instruct the LLM what to do. And even that is pretty iffy, because most of it like "put a button on a page" is not copyright-able.
Do you think that human directing the agent owns copyright for any legal reason?
The case Community for Creative Non Violence Vs Reid (https://en.wikipedia.org/wiki/Community_for_Creative_Non-Vio...) solidifies a supreme court opinion that someone contracting a work and directing an author does not grant authorship to the commissioner of the work, it grants authorship to the person actually doing the work.
The author can grant authorship and copyright to the commissioner with a contract, but the monkey picture (and others) have solidified that only humans can be granted copyright. Since LLMs aren't human they can't hold copyright, and if the LLM doesn't have legal copyright then they don't have legal rights to assign copyright to you.
It depends on what level of creative control you had over the code.
Code is protected by copyright as a literary work. The method is not protected by copyright, that would be the domain of patents. What's protected are the words.
If you say "Claude, build me a website about X" then you do not have any creative control over the literary work Claude is producing. You just told a machine to write it for you. Nor, like a compiler, is it derivative of any other work that you wrote.
If, on the other hand, you are working jointly with Claude to make specific changes to the code on a line-by-line basis, then you will have no problem claiming copyright over the code. Claude in this case is acting as a tool, but there's still a human making decisions about the code.
In the case where you wrote a bunch of markdown and then told Claude to generate the corresponding code but didn't have any involvement in writing the code itself, you could perhaps claim that the code is a derivative work of the markdown, a court would have to handle that case-by-case basis and evaluate how much control you exerted over the work.
> only humans can be granted copyright.
No, a copyright application can be filed with a corporation listed as the author. Watch for the copyright notice at the end of the next major movie you see.
However, until very recently the creative product must have been created by someone so there is an implicitly created copyright over the product in the first place. With AI output, that might not continue to be true, we don't really know how it'll work out yet.
In any case, the corporation did not create the product, people created it and their contractual relationship with the corporation defined how the ownership of that work was managed. So, I don't find it too unusual that this element of personhood is available to corporations.
Interesting, though, that ownership of the code can still be transferred to the employer. So it's in the public domain (because not human authored) but owned by the employer (because the human and/or LLM was employed by the employer)? I don't really understand how this works.
Copyright works on derivative rules - is the component of the work unmistakenly derived from another copyrighted work.
Under at least EU AI Act, any work done by AI is not granted copyright. But it does not mean copyright does not apply, it means the amount of work credited to AI is set at 0% (simplification). A human working off another's work unless it's perfect copy will have "credit" for changes that are judged creative/transformative, meaning a human plagiarizing something still can claim to have some degree of authorship. An AI won't.
In a sense, the copyright status of final work is a sort of "sum with dilution" were each work involved adds to claims, but AI's output is set at 0 - the prompt or further rework by human is not.
As for employer, details vary but generally "work for hire" rules and contracts do reassignment of material rights (in EU and some other places you can not reassign moral rights which are a different thing).
Note: IANAL
I think what this means is that the employee may not be the copyright owner for multiple reasons, which are possibly applicable simultaneously. It does not imply that the employer owns copyright over the work that is in public domain, which would be a contradiction.
but the ability for the agent to build it in the first place is based off of stolen IP.
I honestly don't understand why the attitude that underlies this is so prevalent.
When I write code, what I write and how I write it is informed by having read countless source code files over my education and my career. Just as I ingest all that experience to fine-tune how my later code is written, so does the LLM from the code it's seen.
The immediate retort to that is that the LLM is looking at code that wasn't its to read. But I don't think that's a valid objection. Pretty much by definition, everything I've learned from has a copyright on it, and other than my own code on my own time, that copyright is owned by someone else. Much of the code that's built up my understanding has been protected by NDA, or even defense-department classifications: it wasn't mine in any way. But it still informs how I do all my future coding.
By analogy: I'm also an artist, especially since my retirement. My approach to photography was influenced by Ansel Adams, and countless other artists whose works I've seen displayed in museums, or in publications and online. My current approach to painting was inspired by Bob Ross and others, and the teachers who have helped me develop. I've taken pieces of what I've seen in all their work, and all of that comes out in my photos and paintings, to varying degrees.
I've taken ideas from others in code and in art, and produced something (hopefully!) different by combining those bits with my own perspective. I don't think anyone has a claim on my product because of this relationship.
Likewise, I know that many of my successors have learned from my code (heck, I led teams, wrote one book about software development!). And I hope that someday my artwork has developed to the point where there's something in it that's worth someone else's attention to assimilate. I've never for a minute - even decades before the advent of LLMs - hoped or even imagined that my work would remain locked up with me, and that the ideas would follow me to the grave.
As they say, we are all standing on the shoulders of giants. None of us would be able to achieve the tiniest fraction of what we have, without assimilating what has come before us. Through many layers of inheritance it's constantly being incorporated in subsequent works.
In a few decades at best, I'll be dead. It probably won't be very long after that when people even forget my name. But the idea that something I've done - my work in developing software systems, or in my photography and painting - will continue to have ripples through time, inspires me and gives me hope that I'll have some tiny shred of immortality beyond my personal demise.
Humans should have more legal privileges than machines, just as individuals should have more legal privileges than corporations. It's really as simple as that. I don't want to gripe around making up justifications, that's how the law should be and if it turns out not to be that, I'm going to be nettled.
I live in the UK, and most US law is based upon English common law, it's not some immutable code given to us from above. It's based upon assumptions and capabilities of the entities participating in the system at the time the law was codified. It can and should change to make more sense if those assumptions and capabilities shift massively.
I get the individual/corporation distinction, but how is a machine another tier here? It's a tool, it can't have any rights at all. The wielder has rights, and curtailing their rights depending on what tool they're using to exercise them seems strange. Potentially justifiable, but it's a different axis from the nature of the actor.
2 replies →
In many of those examples, there is payment to the creator of the works that others are learning from. Authors are paid for their books, when we listen to music on the radio the musician is paid royalties, etc. When you lead a team and mentor junior engineers you're being paid for your time.
The nature of the source material matters though. Training a model on open source software seems perfectly fair - it has explicitly been released to the public, and learning from the code has never been a contested use.
IMO the questions around coding models should be seen as less about LLMs and more as a subset of the conversation about large companies driving immense profits from the work of volunteers on open-source projects, i.e. it's more about open source than AI.
Scale and the ability to generate a livelihood of your creations and/or the ability to control how what you have created is used, for instance, to demand attribution.
The attitude is derived from a general animus many have towards AI companies. They resent the efficacy of AI because it devalues individual expertise.
I can't imagine it really justifiable to say that training off data is the same as "stealing", when that same claim, that learned information that a person could retain and reproduce constitutes copyright infringement is the subject of many dystopian narratives, like this one, where once your brain is uploaded to the cloud you have to pay royalties based on every media product you remember.
https://www.youtube.com/watch?v=IFe9wiDfb0E
This is the answer. People don't like having their livelihood threatened so they kick the thing that threatens it.
Part of how AI works is that it's just really complicated compression, you can get AI to write out Harry Potter novels word for word with the right prompting.
When it picks out a rare bit of code, it will be simply copying that code, illegally, and presenting it without attribution or any licenses which is in fact breaking the law but AI companies are too important for the law to apply to them.
There's been instances where models have spat out comments in code that mention original authors, etc., effectively outing itself as a copyright thief.
There's nothing anyone can do about it, but the suspicion is that the big companies have taken everyone's code on GitHub, without consent, and trained on it.
And now are spitting out big chunks of copyrighted code and presented it as somehow transformed even though all they've actually done is change a few variable names.
It is copyright theft, but because programmers are little people, not Disney, we don't have any recourse.
6 replies →
For another human being to look at my open source code, learn from it, get inspired by it, appreciate what I did, and let it influence their own creativity would bring me joy. That's why I open sourced it in the first place.
Few people ever actually read open source code, but I'd like to think on the rare occasions they do, they share a connection with the author. I know when I read somebody else's code, for me to understand it I have to be thinking about the problem the same way they were when they wrote it. I feel empathy with them and can sometimes picture the struggle, backtracking, and eureka moments they went through to come up with their solution.
Somehow I don't get the same warm fuzzy feelings about a machine powered by investor money ingesting my work automatically, in milliseconds, and coldly compressing it down to a few nudges on a few weights out of trillions of parameters. All so the machine can produce outputs on-demand for lazy users who will never know of me or appreciate my little contribution, and ultimately for the financial benefit of some billionaires who see me as an obsolete waste of space.
I guess I'm just irrational that way.
We're moving into the 'industrial age of software'. You exact issue, of bespoke, well thought out and well-crafted code is one that craftsmen felt at the beginning of the industrial age. Now, parts are designed and churned out by machines that no one sees or cares about (generally speaking). This is where we are going with software, and production at a truly industrial scale has its place.
And so does well-crafted bespoke software.
The engineers who built the foundation for the industrial expansion of our forefathers went through the same exact thing we're going through now. They look at what existed, and use it to inform their efforts. This is what LLMs do.
I'm not attempting to moralize here, just comment on the parallels. Do I agree that a craftman's work is consumed by the juggernauts and no second thought is given? No. I think its a shame. But I also think the output will never match the artisans that practice now. By the very nature of the machines we employ, we cannot match the skill or thought that goes into bespoke code.
1 reply →
You’re confusing yourself with a commercial product. You’re not a product that was created by other human beings based on someone else’s IP.
You’re not a product that was created by other human beings based on someone else’s IP.
It turns out that's false. We know that genes are patentable; remember back during the Human Genome Project, when there was such a rush to patent them? So genes are IP. (This seems bizarre to me, since they're patenting something that was found just sitting there, but this is what the system says right now.)
Well, two other humans (aka mom and dad) did create me, based on those patentable genes (and most likely including some genes that were, in fact, patented).
I'm not sure what to conclude from all of that, but I do think that it invalidates your argument.
> When I write code, what I write and how I write it is informed by having read countless source code files over my education and my career. Just as I ingest all that experience to fine-tune how my later code is written, so does the LLM from the code it's seen.
You are presumably human. We have granted humans specific exemptions in copyright law. We have not granted that to LLMs. Why are we so eager to?
We did not grant human exemptions in copyright law.
We gave certain temporary monopoly on certain uses to humans under rules little understood by laymen even if their livelihood depends on it.
1 reply →
Ok, so I use the LLM. I use the tool. Can I now apply the exemption to me?
Are you telling me that I can use the thing, but I can't use it if I process it through an LLM? It get slippery, fast.
2 replies →
I'm not sure where in our lawbooks there are laws that specifically target humans to the exclusion of human-operated tools.
There's also a TON of irony here. What an about face it is, for the community at large* to switch from "information wants to be free, we support copyleft and FOSS" to leaning so heavily on an incredibly conservative reading of IP law.
2 replies →
Because that allows us to create useful tools that we didn't have before. For me it feels like a carpenter going from a hand-saw to an electrical saw. Still requires the skills of a good carpenter, but faster and easier.
3 replies →
I've created my own DSL, and instruct Claude Code how to generate code for this DSL using skills.
Since this is a new language, and not documented on the web nor on Github, Claude's ability is not based off of stolen IP. At best it's trained on other language concepts, just like we can train ourselves on code on GitHub.
Maybe a good reason to create a new programming language?
Interesting, but I still do not think this is as easy. The AI model is still trained on some existing works, and it is generating code in the new DSL or programming language still based some higher level ideas and expressions it has consumed during training. You have added just one more level of indirection. The output cannot anymore be verbatim copy of some existing work or non-short snippets, however, the output may still carry "expression" that are substantially similar to something pre-existing.
Note: IANAL. The above is just from my current understanding.
I wonder what OSS licenses would have looked like if we saw all of this coming.
I could possibly see an argument for the owner being whoever paid for the tokens used, but honestly I think the argument for that is weaker than what you're suggesting; I'm merely playing devil's advocate here.
I don't think there's even a valid argument for any other ownership model, or at least none that I can think of.
I see the argument for whoever paid for the tokens. Or in the case of a free AI usage, the person who sent the prompt (or whoever they are acting on behalf of, i.e. the company they are working for at the time).
The primary issue being that it's all built on stolen data in the first place.
Even taking the least generous interpretation of what LLMs do and saying they're just "copy/pasting others' code" it's still not stealing because the original still exists and presumably still makes money. The original has to be gone for theft to have occurred.
In order to have a sane conversation about this we have to all agree not to lie.
1 reply →
No, that human owns the copyright on the prompt, not on the work product.
If that were true, a developer may own copyright over the source code, but nothing on the compiled binaries, and I could download practically all software available as compiled binaries and use for free.
Indeed a developer owns copyright over the source code and on the compiled binaries, because there is no expansion happening here but just a translation from one format into another, the kind of thing that has been ruled copyrightable since copyright exists. The same goes for translations from one human language into another, and anybody with knowledge of more than one language will be happy to acknowledge that translating is hard work. Even so, the translator does not hold copyright on the result, at best they can say they have created a derived work and it is the original author that continues to hold copyright.
Compilation and translation happen in a generic manner and does not rely on a mountain of other IP, it is really just a transformative tool that happens to do something useful, someone constructed it to be a very precise translation to the point that any mistakes in it are called bugs and we fix them to ensure the process stays deterministic. Translators try hard to 'get it right' too: to affect the intentions of the original author as little as possible.
When you use a model loaded up with noise or that you have trained exclusively on code that you actually wrote I think a strong case could be made that you own the copyright on that work product. But when you train that model on other people's work, especially without their consent or use a model that has been trained in that way you lose your right to call the output of that model yours.
You did not write it, and the transformative process requires terabytes of other people's IP and only a little bit by you.
As soon as you can prove that your contribution substantially outweighs the amount of IP contributed in total you would have a much stronger case.
5 replies →
> If that were true, a developer may own copyright over the source code, but nothing on the compiled binaries, and I could download practically all software available as compiled binaries and use for free.
If the compiled binaries (output) were produced by running the input (source code) over every program written, then sure.
But that's not what's happening with compilers, is it? The output of a prompt is dependent on copyrighted work of others every single time it is run.
The output of a compiler is not dependent on the copyright output of every other program.
1 reply →
That’s now how it works. The human using the tool (like claude code, etc) owns the copyright of the code generated.
No, you are wrong about this.
See:
https://technophilosoph.com/en/2025/02/07/ai-prompts-and-out...
If you have a more recent citation referring to case law that states the opposite then that would be great but afaik this article reflects the current state of affairs.
The human using the tool creates a prompt, there is then an automatic transformation of the prompt into code. Such automatic transformation is generally accepted as not to create a new work (after all, anybody else inputting the same prompt would have a reasonable expectation of generating the same output modulo some noise due to versioning and possibly other local context).
Claud code and in general AI generated code does not at present create a new work. But the prompt, that part which you input may be sufficiently creative to warrant copyright protection.
7 replies →
So I’m responsible for pushing the giant boulder at the top of the hill.
The humans at the bottom who were crushed should blame the boulder, which happened to be moving.
I'm not sure what point you are trying to make.
11 replies →
I agree with this sentiment, because the person directing the agent can still direct it in a way where it'll produce a better or worse output than another person directing it.
The LLM is just a database. It's like saying 'I own the copyright to what comes out of an API because I crafted the query' or 'I own the copyright to the responses I get from the bots on the Starship Titanic because I crafted the message they respond to'.
This interpretation makes sense. I think even the 'fair use' clause in the US doesn't protect LLMs. One argument I've heard often is that LLMs synthesize their training set to produce novel output in the same way as a human would... That may be the case, but legally an LLM isn't a human. You can't look at the output of an LLM and say that it's 'fair use' with respect to its training set; it hasn't been established that AI has the same 'fair use' right as a human does; it's already pushing it that companies have this right (let alone an AI agent); anyway, that's just one problem... Also, this is ignoring the fact that the researchers who compiled the training set COPIED the original copyrighted data in order to produce that training set. They either copied the entire work into the training set or they fed the entire work directly into the LLM; in either case; at some point, the entire work was copied verbatim into the LLM's input layer before it was ingested by the AI. The researchers copied the copyrighted content without permission.
Also, when it comes to code, the case is even more damning because the vast majority of the code which LLMs are trained on was not only copyright but subject to an MIT license (at best) and even the MIT license, which is the most permissive license in existence, still says clearly:
"Permission is hereby granted, free of charge, to any person obtaining a copy of this software"
The word 'person' is used very intentionally here.
I think there should be several kinds of AI taxes which should be distributed to all copyright holders. There should be a tax to go to writers (and book authors), a tax to go to open source developers and a tax for the general population to distribute as UBI to account for small-form content like comments and photography...
People invested a lot of time building their entire careers around the assumption of copyright protection; so for it to be violated on such a scale would be a massive betrayal.
I find idea that the code could be copyrightable as weak. There are only so many ways to write a for loop. Similarly you can't copyright schematics (apart from exact visual representation as form of art). Code is just a schematic.
Note: IANAL
Copyrights already preclude short phrases for the same reason -- there are only so many ways in which short phrases could be produced. The moment a work becomes larger (large enough; AFAIK, the threshold is not precisely defined), the reasoning you applied fails to apply.
The Google-Oracle lawsuit did not decide whether APIs (when large in number) are copyrightable or not.
Let me get this straight: Since there are only so many ways to write a for loop, you doubt that for loops are copyrightable. From this you conclude that code, in general isn't copyrightable?
That's like saying "there's only so many ways to greet your neighbor, so any text that simply greets your neighbor isn't copyrightable – and therefore no text is copyrightable".
You can think that's how it should be. But that's not necessarily how it is. I'm reminded of the famous monkey selfie copyright dispute [1]. A photographer set up a camera and gave it to a monkey but after a legal dispute, courts decided nobody owned the copyright.
I can totally see this applying here as well.
Now this doesn't resolve the issue of AIs being trained on copyrighted works it had no rights to. The counterargument is that this is a derivative or transformative work but I don't believe that's settled law at all.
[1]: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...