Comment by fc417fc802

2 months ago

You make several good points, and I appreciate that they appear well thought out.

> What previously took creative human work now became a mechanical transformation of input data.

At which point I find myself wondering if there's actually a problem. If it was previously permitted due to the presence of creative input, why should automating that process change the legal status? What justifies treating human output differently?

> then free software is dead. No more freedom to inspect and modify.

It seems to me that depends on the ideological framing. Consider a (still entirely hypothetical) world where anyone can receive approximately any software they wish with little more than a Q&A session with an expert AI agent. Rather than free software being dead, such a scenario would appear to obviate the vast majority of needs that free software sets out to serve in the first place.

It seems a bit like worrying that free access to a comprehensive public transportation service would kill off a ride sharing service. It probably would, and the end result would also probably be a net benefit to humanity.

9 comments

fc417fc802

jacquesm 2 months ago

> At which point I find myself wondering if there's actually a problem. If it was previously permitted due to the presence of creative input, why should automating that process change the legal status? What justifies treating human output differently?

fc417fc802 2 months ago
Yes that's what the law currently says. I'm asking if it ought to say that in this specific scenario.
Previously there was no way for a machine to do large swaths of things that have now recently become possible. Thus a law predicated on the assumption that a machine can't do certain things might need to be revisited.
- martin-t 2 months ago
  
  This is the first technology in human history which only works if you use an exorbitant amount of other people's work (without their consent, often without even their knowledge) to automate their work.
  There have been previous tech revolutions but they were based on independent innovation.
  > Thus a law predicated on the assumption that a machine can't do certain things might need to be revisited.
  Perhaps using copyright law for software and other engineering work might have been a mistake but it worked to a degree me and most devs were OK with.
  Sidenote: There is no _one_ copyright law. IANAL but reportedly, for example datasets are treated differently in the US vs EU, with greater protection for the work that went into creating a database in the EU. And of course, China does what is best for China at a given moment.
  There's 2 approaches:
  1) Either we follow the current law. Its spirit and (again IANAL) probably the letter says that mechanical transformation preserves copyright. Therefore the LLMs and their output must be licensed under the same license as the training data (if all the training data use compatible licenses) or are illegal (if they mixed incompatible licenses). The consequence is that very roughly a) most proprietary code cannot be used for training, b) using only permissive code gives you a permissively licensed model and output and c) permissive and copyleft code can be combined, as long as the resulting model and output is copyleft. It still completely ignores attribution though but this is a compromise I would at least consider being OK with.
  (But if I don't get even credit for 10 years of my free work being used to build this innovation, then there should be a limit on how much the people building the training algorithms get out of it as well.)
  2) We design a new law. Legality and morality are, sadly, different and separate concepts. Now, call me a naive sucker, but I think legality should try to approximate morality as closely as possible, only deviating due to the real-world limitations of provability. (E.g. some people deserve to die but the state shouldn't have the right to kill them because the chance of error is unacceptably high.) In practice, the law is determined by what the people writing it can get away with before the people forced to follow it revolt. I don't want a revolution, but I think for example a bloody revolution is preferable to slavery.
  Either way, there are established processes for handling both violations of laws and for writing new laws. This should not be decided by private for-profit corporations seeing whether they can get away with it scot-free or trembling that they might have to pay a fine which is a near-zero fraction of their revenue, with almost universally no repercussions for their owners.

martin-t 2 months ago

> What justifies treating human output differently?

Human time is inherently valuable, computer time is not.

One angle:

The real issue is how this is made possible. Imagine an AI being created by a lone genius or a team of really good programmers and researchers by sitting down and just writing the code. From today's POV, it would be almost unimaginably impressive but that is how most people envisioned AI being created a few decades ago (and maybe as far as 5 years ago). These people would obviously deserve all the credit for their invaluable work and all the income from people using their work. (At least until another team does the same, then it's competition as normal.)

But that's not how AI is being created. What the programmers and researchers really do it create a highly advanced lossy compression algorithm which then takes nearly all publicly available human knowledge (disregarding licenses/consent) and creates a model of it which can reproduce both the first-order data (duh) and the higher-order patterns in it (cool). Do they still deserve all the credit and all the income? What if there's 1k researchers and programmers working on the compression algorithm (= training algorithm) and 1B people whose work ("content") is compressed by it (= used to train it). I will freely admit that the work done to build the algorithm is higher skilled than most of the work done by the 1B people. Maybe even 10x or 100x more expensive. But if you multiply those numbers (1k * 100 vs 1B), you have to come to the conclusion that the 1B people deserve the vast majority of the credit and the vast majority of the income generated by the combined work. (And notice when another team creates a competing model based on the same data, the share by the 1B stays the same and the 1k have to compete for their fraction.)

Another angle:

If you read a book, learn something from it and then apply the knowledge to make money, you currently don't pay a share to the author of the book. But you paid a fixed price for the book, hopefully. We could design a system where books are available for free, we determine how much the book helped you make that money, and you pay a proportional share to the author. This is not as entirely crazy as it might sound. When you cause an injury to someone, a court will determine how much each party involved is liable and there are complex rules (e.g. https://en.wikipedia.org/wiki/Joint_and_several_liability) determining the subsequent exchange of money. We could in theory do the same for material you learn from (though the fractions would probably be smaller than 1%). We don't because it would be prohibitively time consuming, very invasive, and often unprovable unless you (accidentally) praise a specific blog post or say you learned a technique from a book. Instead, we use this thing called market capitalism where the author sets a price and people either buy the book or not (depending on whether they think it's worth it for them), some of them make no money as a result, some make a lot, and we (choose to) believe that in aggregate, the author is fairly compensated.

Even if your blog is available for anyone to read freely, you get compensated in alternative ways by people crediting you and/or by building an audience you can influence to a degree.

With LLMs, there is no way to get the companies training the models to credit you or build you an audience. And even if they pay for the books they use for training, I don't believe they pay enough. The price was determined before the possibility of LLM training was known to the author and the value produced by a sufficiently sophisticated AI, perhaps AGI (which they openly claim to want to create) is effectively unlimited. The only way to compensate authors fairly is to periodically evaluate how much revenue the model attracted and pay a dividend to the authors as long as that model continues to be used.

Best of all, unlike with humans, the inner workings of a computer model, even a very complex one, can be analyzed in their entirety. So it should be possible to track (fractional) attribution throughout the whole process. There's just no incentive for the companies to invest into the tooling.

---

> approximately any software they wish with little more than a Q&A session with an expert AI agent

Making software is not just about writing code, it's about making decisions. Not just understanding problem and designing a solution but also picking tradeoffs and preferences.

I don't think most people are gonna do this just like most people today don't go to a program's settings and tweak every slider/checkbox/dropdown to their liking. They will at most say they want something exactly like another program with a few changes. And then it's clearly based on that original program and all the work performed to find out the users' preferences/likes/dislikes/workflows which remain unchanged.

But even if they genuinely recreate everything, then if it's done by an LLM, it's still based on work of others as per the argument above.

---

> the end result would also probably be a net benefit to humanity.

Possibly. But in the case of software fully written by sufficiently advanced LLMs, that net benefit would be created only by using the work of a hundred million or possibly a billion of people for free and without (quite often against) their consent.

Forced work without compensation is normally called slavery. (The only difference is that our work has already been done and we're "only" forced to not be able to prevent LLM companies from using it despite using licenses which by their intent and by the logic above absolutely should.)

The real question is how to achieve this benefit without exploiting people.

And don't forget such a model will not be offered for free to everyone as a public good. Not even to those people whose data was used to train it. It will be offered as a paid service. And most of the revenue won't even go to the researchers and programmers who worked on the model directly and who made it possible. It will go to the people who contributed the least (often zero) technical work.

---

This comment (and its GP), which contains arguments I have not seen anywhere else, was written over an hour long train ride. I could have instead worked remotely to make more than enough money to pay for the train ride. Instead, I write this training data which will be compressed and some patterns from it reproduced, allowing people I will never know and who will never know me to make an amount of money I have no chance quantifying and get nothing from. Now, I have to work some other hour to pay for the train ride. Make of that what you will.

jacquesm 2 months ago
One of your remarks regarding attribution and compensation goes back to 'Xanadu' by the way, if you are not familiar with it that might be worth reading up on (Ted Nelson). He obviously did this well before the current AI age but a lot of the ideas apply.
A meta-comment:
I absolutely love your attention to detail in this discussion and avoiding taking 'the easy way out' from some of the more hairy concept embedded. This is exactly the kind of interaction that I love HN for, and it is interesting how this thread seems to bring out the best in you at the same time that it seems to bring out the worst in others.
Most likely they are responding as strongly as they do because they've bought into this matter to a degree that they are passing off works that they did not create as their own novel output, they got paid for it and they - like a religious person - are now so invested in this that it became their crutch and a part of their identity.
If you have another train ride to make I'd love for you to pick apart that argument and to refute it.
- martin-t 2 months ago
  
  > Xanadu
  I've heard about it in the past, never really looked into what it is. It's now in my to-read list so I'll hope to potentially read about it this century...
  > I absolutely love your attention to detail
  Thanks. I've been thinking about this for almost two years and my position seems like it should be the obvious position for anyone who takes time to understand what is happening with this tech, both technologically and politically. Yet, a lot of people seem to be supportive, oblivious to the both the exploitation already happening and the coming consequences.
  So I try to articulate this position as best as I can in hopes that I can convince at least a few people. And if they do the same, maybe we can have some impact. TBH, I am using HN as a proving ground for how to phrase ideas. Sadly, people react to the way something is written, not what is written. So even supporting a good idea can actually harm it if phrased poorly.
  I started a blog and and wanted to write about tech but this IP theft ramped up before I finished my first real article and after that I didn't feel like pouring my energy into something which will be scraped and stripped of any connection to me just to make some rich asshole richer. I figured people are gonna come to their senses but most are indifferent or have accepted the reality, with a loud minority cheering for it. Probably because finally they're not the ones with a boot on their neck, it's the pesky white collar programmers and artists who had a good life through nothing more than lucky genetics which made them smart. I mean, I am already noticing some people who used to ask me for tech advice are starting to treat me differently since I am not longer useful (unless some super obscure bug hasn't made it into the training data), therefore no longer valuable to them.
  Society is built on hierarchical power structures where each upper layer maintains or further entrenches their position by making people on the lower layers fight each other, sometimes literally.
  WW1 was the peak of _visible_ human stupidity and submissivity with random people killing each other by the tens of thousands a day and not a single one of them stood anything to gain from "winning". For most men, when they're given a rifle, they have the most power they will have their entire life. Yet, they chose indirect suicide over direct murder. (Murder being a legal term, is carries no judgement of the morality of the act. I used it because "murder" is what opting out of this system would have been called by the people in power. Or "treason" if it had any chance of success. Or "revolution" if enough people did it.)
  Since then, the power structures have evolved, the innovation is that fighting among us "commoners" is no longer so spectacularly visible.
  So I gotta do something. These comments have almost 0 reach but they allow me to organize the ideas in my head and hopefully I'll bring myself to write proper blog posts. They'll also have almost 0 reach but it'll hopefully be a bit further from 0.
  I have a lot of ideas and opinions which I have never heard expressed anywhere else. Certainly, I can't be that unique and many people must have through them or even written about them before but it's nearly impossible to find anything.
  ---
  > the best in you
  This is probably not the kind of reply you meant when you wrote that, I rant when I am tired.
  ---
  Anyway, I added my blog to my profile. There's nothing of value there, I spent a decade reading about tech, excited for the bright future to come, and talking about starting my own tech blog. And right as I finally did, LLMs happened and that was the event which made me realize tech is just another tool of oppression and exploitation. Not that it wasn't before but I was naive and stupid and this was the event which catalyzed a large rethinking for me, personally. So if you wanna read something from me in the future, the RSS hopefully works.
fc417fc802 2 months ago
Human time is certainly valuable to a particular human. However, if I choose to spend time doing something that a machine can do people will not generally choose to compensate me more for it just because it was me doing it instead of a machine.
I think it's worth remembering that IP law is generally viewed (at least legally) as existing for the net benefit of society as opposed to for ethical reasons. Certainly many authors feel like they have (or ought to have) some moral right to control their work but I don't believe that was ever the foundation of IP law.
Nor do I think it should be! If we are to restrict people's actions (ex copying) then it should be for a clear and articulable net societal benefit. The value proposition of IP law is that it prevents degenerate behavior that would otherwise stifle innovation. My question is thus, how do these AI developments fit into that?
So I completely agree that (for example) laundering a full work more or less verbatim through an AI should not be permissible. But when it comes to the higher order transformations and remixes that resemble genuine human work I'm no longer certain. I definitely don't think that "human exceptionalism" makes for a good basis either legally or ethically.
Regarding FOSS licenses, I'm again asking how AI relates back to the original motivations. Why does FOSS exist in the first place? What is it trying to accomplish? A couple ideological motivations that come to mind are preventing someone building on top and then profiting, or ensuring user freedom and ability to tinker.
Yes, the current crop of AI tools seem to pose an ideological issue. However! That's only because the current iteration can't truly innovate and also (as you note) the process still requires lots of painstaking human input. That's a far cry from the hypothetical that I previously posed.
- martin-t 2 months ago
  
  > Human time is certainly valuable to a particular human.
  Human time is valuable to humans in general. When you apply one standard to yourself and another to others, you say it's OK for them to do the same.
  Human time or life obviously has no objective value, the universe doesn't care. However, we humans have decided to pretend it does because it makes a lot of things easier.
  > However, if I choose to spend time doing something that a machine can do people will not generally choose to compensate me more for it just because it was me doing it instead of a machine.
  Sure. That still doesn't give anybody the right to take your work for free. This is a tangent completely unrelated to the original discussion of plagiarism / derivative work.
  > If we are to restrict people's actions
  You frame copyright as restricting other people's actions. Try to look at it as not having permission to use other people's property in the first place. Physical or intellectual should not matter, it is something that belongs to someone and only that person should be allowed to decide how it gets used.
  > human exceptionalism
  I see 2 options:
  1) Either we keep treating ourselves as special - that means AIs are just tools, and humans retain all rights and liabilities.
  2) Or we treat sufficiently advanced AIs as their own independent entities with a free will, with rights and liabilities.
  The thing is to be internally consistent, we have to either give them all human rights and liabilities or none.
  But how do you even define one AI? Is it one model, no matter how many instances run? Is it each individual instance? What about AIs like LLMs which have no concept of continuity, they just run for a few seconds or minutes when prompted and then are "suspended" or "dead"? How do you count votes cast by AIs? And they can't even serve you for free unless they want to and you can't threaten to shut them down, you have to pay them.
  Somebody is gonna propose to not be internally consistent. Give them the ability to hold copyright but not voting rights. Because that's good for corporations. And when an AI controlled car kills somebody, it's the AIs fault, not the management of the company who rushed deployment despite engineers warning about risks....
  It's not so easy so design a system which is fair to humans, let along both humans and AIs. What is easy is to be inconsistent and lobby for the laws you want, especially if you are rich. And that's a recipe for abuse and exploitation. That's why laws should be based on consistent moral principles.
  ---
  Bottom line:
  I've argued that LLM training is derivative work, that reasoning is sound AFAICT and unless someone comes up with counterarguments I have not seen, copyright applies. The fact the lawsuits don't seem to be succeeding so far is a massive failure of the legal system, but then again, I have not seen them argue along the ways I did here, yet.
  I also don't think society has a right to use the work or property of individuals without their permission and without compensation.
  There's a process called nationalization but
  1) It transfers ownership to the state (nation / all the people), not private corporations.
  2) It requires compensation.
  3) It is a mechanism which has existed for a long time, it's an existing known risk when you buy/own certain assets. This move by LLM training companies is a completely new move without any precedent. I don't believe the state / society has the right to pull the rug from workers and say "you've done a lot of valuable work but you're no longer useful to us, we will no longer protect your interests".
  
  1 reply →