Comment by EnPissant
3 months ago
This is a tough one for the HN crowd. It's like that man not sure which button to push meme.
1) RIAA is evil for enforcing copyrights on lyrics?
2) OpenAI is evil for training on lyrics?
3 months ago
This is a tough one for the HN crowd. It's like that man not sure which button to push meme.
1) RIAA is evil for enforcing copyrights on lyrics?
2) OpenAI is evil for training on lyrics?
I know nuance takes the fun out of most online discussions, but there's a qualitative difference between a bunch of college kids downloading mp3's on a torrent site and a $500 billion company who's goal among other things is to become the primary access point to all things digital.
Should young adults be allowed to violate copyright and no one else? The damages caused seem far worse than an LLM being able to reproduce song lyrics.
Is it simply "we like college kids" and "we hate OpenAI"? that dictates this?
I'm ready, hit me with the nuance.
A young adult who pirates, is also more likely to make purchases in that industry, and has an impact that is limited.
A corporation who pirates, is more likely to pirate en masse everything that they can get their hands on, in an ongoing manner, and throw everything they can at contesting their right to do so in court.
1 reply →
Maybe individuals and corporations are differents enough copyright should not work the same way.
1 reply →
What damages? You can learn lyrics by listening the song.
3 replies →
Why not both? As the GP mentioned, lyrics are also invaluable for people besides training for AI.
I think the perceived lack-of-value for them is related to how easy it is to write lyrics down, compared to any other aspect of the music. Anyone can do it within the time of the song, usually first try. Any other aspect of the song cant't just be written down from ear (yes, including the sheet music, which isn't nearly expressive enough to reproduce a performance*).
*There are some funny "play from sheet music without knowing the song" type videos out there, with funny results. YouTube/google search is no longe usable, so I can't find any.
I think you mean the RIAA
RAII is a different kind of (necessary) evil
Indeed, too much C++. Edited.
3) Some types of data are more ethical to train on than others.
Training on Wikipedia? Cool! Training on pirated copies of books? Not cool! Training on lyrics? IMO that's on the "cool" side of the line, because the "product" is not the words, it's the composition and mastered song.
Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs.
Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech.
LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.
> Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs. > > Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech. > > LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.
Sorry this such a (purposefully?) naive take. In reality the thoughts are much more nuanced. For one open source/free software doesn't exist without copyright. Then there is the whole issue that these companies use vast amount of copyrighted material to train their models, arguing that all this is fair use. But on the other hand they lock their models behind walls, disallow training on them, keep the training methods and data selection secret...
This tends to be what people disagree with. It feels very much different rules for thee and me. Just imagine how outraged Sam Altman would act if someone leaked the code for Gpt5 and all the training scripts.
If we agree that copyright does not apply to llms, then it should also not apply to llms and they should be required to release all their models and the way of training them.
Does that mean you would support open LLM model training on copyrighted data?
1 reply →
My point wasn't supposed to be that copyright is bad (or that it's good), just that the business logic of fighting the sharing of lyrics is incomprehensible to me.
That aside, I think there's a lot more complexity than you're presenting. The issue is who gets to benefit from what work.
As hackers, we build cool things. And our ability to build cool things comes in large part from standing on the shoulders of giants. Free and open sharing of ideas is a powerful force for human progress.
But people also have to eat. Which means even as hackers focused on building cool things, we need to get paid. We need to capture for ourselves some of the economic value of what we produce. There's nothing wrong with wanting to get paid for what you create.
Right now, there is a great deal of hacker output the economic value of which is being captured almost exclusively by LLM vendors. And sure, the LLM is more amazing than whatever code or post or book or lyric it was trained on. And sure, the LLM value comes from the sum of the parts of its source material instead of the value of any individual source. But fundamentally the LLM couldn't exist without the source material, and yet the LLM vendor is the one who gets to eat.
The balance between free and open exchange of ideas and paying value creators a portion of the value they create is not an easy question, and it's not anti-hacker to raise it. There are places where patents and other forms of exclusive rights seem to be criminally mismanaged, stifling progress. But there's also "some random person in Nebraska" who has produced billions of dollars in value and will never see a penny of it. Choosing progress alone as the goal will systematically deprive and ultimately drive away the very people whose contributions are enabling the progress. (And of course choosing "fair" repayment alone as the goal will shut down progress and allow less "fair" players to take over... that's why this isn't easy.)
Sounds like it was never about copyright as a principle, only symbolic politics (ie. copyrights benefit megacorps? copyright needs to be weaker! copyright hurts megacorps? copyright needs to be stronger!)
Actually in Germany it's GEMA