Comment by briandear

3 months ago

Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs.

Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech.

LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.

> Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs. > > Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech. > > LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.

Sorry this such a (purposefully?) naive take. In reality the thoughts are much more nuanced. For one open source/free software doesn't exist without copyright. Then there is the whole issue that these companies use vast amount of copyrighted material to train their models, arguing that all this is fair use. But on the other hand they lock their models behind walls, disallow training on them, keep the training methods and data selection secret...

This tends to be what people disagree with. It feels very much different rules for thee and me. Just imagine how outraged Sam Altman would act if someone leaked the code for Gpt5 and all the training scripts.

If we agree that copyright does not apply to llms, then it should also not apply to llms and they should be required to release all their models and the way of training them.

  • Does that mean you would support open LLM model training on copyrighted data?

    • I think that opens several other cans of worms, but in principle I would support a solution that allows using copyrighted materials if it is for the common good (I.e the results are released fully open, means not just weights but everything else).

      As a side note i am definitely not strong into IP rights, but I can see the benefits of copyright much more clearly than patents.

My point wasn't supposed to be that copyright is bad (or that it's good), just that the business logic of fighting the sharing of lyrics is incomprehensible to me.

That aside, I think there's a lot more complexity than you're presenting. The issue is who gets to benefit from what work.

As hackers, we build cool things. And our ability to build cool things comes in large part from standing on the shoulders of giants. Free and open sharing of ideas is a powerful force for human progress.

But people also have to eat. Which means even as hackers focused on building cool things, we need to get paid. We need to capture for ourselves some of the economic value of what we produce. There's nothing wrong with wanting to get paid for what you create.

Right now, there is a great deal of hacker output the economic value of which is being captured almost exclusively by LLM vendors. And sure, the LLM is more amazing than whatever code or post or book or lyric it was trained on. And sure, the LLM value comes from the sum of the parts of its source material instead of the value of any individual source. But fundamentally the LLM couldn't exist without the source material, and yet the LLM vendor is the one who gets to eat.

The balance between free and open exchange of ideas and paying value creators a portion of the value they create is not an easy question, and it's not anti-hacker to raise it. There are places where patents and other forms of exclusive rights seem to be criminally mismanaged, stifling progress. But there's also "some random person in Nebraska" who has produced billions of dollars in value and will never see a penny of it. Choosing progress alone as the goal will systematically deprive and ultimately drive away the very people whose contributions are enabling the progress. (And of course choosing "fair" repayment alone as the goal will shut down progress and allow less "fair" players to take over... that's why this isn't easy.)

Sounds like it was never about copyright as a principle, only symbolic politics (ie. copyrights benefit megacorps? copyright needs to be weaker! copyright hurts megacorps? copyright needs to be stronger!)