Comment by EdgeExplorer
3 months ago
The obsession with protecting access to lyrics is one of the strangest long-running legal battles to me. I will skip tracks on Spotify sometimes specifically because there are no lyrics available. Easy access to lyrics is practically an advertisement for the music. Why do record companies not want lyrics freely available? In most cases, it means they aren't available at all. How is that a good business decision?
They probably fear a domino effect if they let go of this. And so they defend it vehemently to avoid setting a precedent.
Think about compositions, samples, performance rights, and so on. There is a lot more at stake.
What's the benefit of protecting monetary IP rights to art?
We'll only get the art that artists really wanted to make? Great!
> What's the benefit of protecting monetary IP rights to art?
What's the benefit of protecting monetary IP rights to software?
What's the benefit of consolidating all meaningful access to computing services to a few trillion-dollar gate-keeping corpos?
1 reply →
What's the benefit of getting paid for your work? We'll only get the work people really want to do? Great!
1 reply →
Hot take: it’s all bullshit.
Like software patents - when you’re not a normie.
Thoughts by someone who doesn’t make a living by songs?
I’m guessing you’d want to restrict lyrics to encourage more plays of the song by people who are motivated to understand them. Along with the artist’s appreciation of that experience of extracting what you’re fascinated by. Burdensome processes generate love and connection to things.
Not everything is a functional commodity to be used and discarded at whim.
One amusing part of lyrics on Spotify to me is how they don't seem to track which songs are instrumentals or not and use that to skip the message about them not knowing the lyrics. An instrumental will pop up and it will say something like "Sorry, we don't have the lyrics to this one yet".
The only thing funnier than that is when they do have the lyrics to a song that probably doesn't need them, like Hocus Pocus by Focus: https://open.spotify.com/track/2uzyiRdvfNI5WxUiItv1y9?si=7a7...
Oh they track that, it's in their API as the "instrumentalness" score: https://developer.spotify.com/documentation/web-api/referenc...
The fact that they don't do anything with that information is unrelated.
Interesting, especially that it's a probability rather than a boolean! The line can be blurry sometimes (like in the example I mentioned), so it makes sense that it might not be possible to come up with a consistent way of classifying them that everyone would agree with.
I’ve also seen cases where they list lyrics for a song that doesn’t have any (usually an instrumental jazz version of an old standard).
The content industries should have been the ones to invent LLMs, but their head is so stuck in the past and in regressive thinking about how they protect their revenue streams that they're incapable of innovating. Publishing houses should have been the ones to have researchers looking into how to computationally leverage their enormous corpus of data. But instead, they put zero dollars into actual research and development and paid the lawyers instead. And so it leads to attitudes like this.
The only people seeing themselves as "content creators" are people giving social media stuff so their users get something they can doom scroll. Other people see themselves as artists, entertainers, musicians, authors, etc.
I'm referring to the rent seekers sitting in between the artists and the public.
“The content industries.”
Why would people invest in destroying what they love?
There is no destruction.
1 reply →
> The content industries should have been the ones to invent LLMs
While exclusively-controlled LLMs would be mildly useful to them, the technology existing is dangerous to them, and they already have a surplus supply of content at low cost that they monetize by controlling discovery, gatekeeping, and promotion, so I don't think it makes sense for them to put energy into LLMs even if they had the technical acumen to recognize the possibilities (much the same way that Google, despite leading in developing the underlying technology, had vvery little incentive to productize it since it was disruptive to their established business, until someone else already did and the choice was to compete on that or lose entirely.)
You have to get ahead of the disruption that will destroy you. At least, if you care about longevity of your company. I realize this isn't always the case.
That's always been the case, eg. how they were latecomers to streaming.
Streaming had to compete with digital music piracy. As a result, Spotify is impossibly cheap compared to buying individual albums or singles in the past. So musicians hardly receive any money from recorded music anymore. Nowadays they basically have only concerts left as a means to earn money.
The composition and lyrics are owned separately from the recorded performance.
I'm pretty sure you could even have lyrics with a separate copyright from the composition itself. For example, you can clearly have lyrics without the music and you can have the composition alone in the case that it is performed as an instrumental cover or something.
This is a tough one for the HN crowd. It's like that man not sure which button to push meme.
1) RIAA is evil for enforcing copyrights on lyrics?
2) OpenAI is evil for training on lyrics?
I know nuance takes the fun out of most online discussions, but there's a qualitative difference between a bunch of college kids downloading mp3's on a torrent site and a $500 billion company who's goal among other things is to become the primary access point to all things digital.
Should young adults be allowed to violate copyright and no one else? The damages caused seem far worse than an LLM being able to reproduce song lyrics.
Is it simply "we like college kids" and "we hate OpenAI"? that dictates this?
I'm ready, hit me with the nuance.
8 replies →
Why not both? As the GP mentioned, lyrics are also invaluable for people besides training for AI.
I think the perceived lack-of-value for them is related to how easy it is to write lyrics down, compared to any other aspect of the music. Anyone can do it within the time of the song, usually first try. Any other aspect of the song cant't just be written down from ear (yes, including the sheet music, which isn't nearly expressive enough to reproduce a performance*).
*There are some funny "play from sheet music without knowing the song" type videos out there, with funny results. YouTube/google search is no longe usable, so I can't find any.
I think you mean the RIAA
RAII is a different kind of (necessary) evil
Indeed, too much C++. Edited.
3) Some types of data are more ethical to train on than others.
Training on Wikipedia? Cool! Training on pirated copies of books? Not cool! Training on lyrics? IMO that's on the "cool" side of the line, because the "product" is not the words, it's the composition and mastered song.
Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs.
Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech.
LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.
> Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs. > > Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech. > > LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.
Sorry this such a (purposefully?) naive take. In reality the thoughts are much more nuanced. For one open source/free software doesn't exist without copyright. Then there is the whole issue that these companies use vast amount of copyrighted material to train their models, arguing that all this is fair use. But on the other hand they lock their models behind walls, disallow training on them, keep the training methods and data selection secret...
This tends to be what people disagree with. It feels very much different rules for thee and me. Just imagine how outraged Sam Altman would act if someone leaked the code for Gpt5 and all the training scripts.
If we agree that copyright does not apply to llms, then it should also not apply to llms and they should be required to release all their models and the way of training them.
2 replies →
My point wasn't supposed to be that copyright is bad (or that it's good), just that the business logic of fighting the sharing of lyrics is incomprehensible to me.
That aside, I think there's a lot more complexity than you're presenting. The issue is who gets to benefit from what work.
As hackers, we build cool things. And our ability to build cool things comes in large part from standing on the shoulders of giants. Free and open sharing of ideas is a powerful force for human progress.
But people also have to eat. Which means even as hackers focused on building cool things, we need to get paid. We need to capture for ourselves some of the economic value of what we produce. There's nothing wrong with wanting to get paid for what you create.
Right now, there is a great deal of hacker output the economic value of which is being captured almost exclusively by LLM vendors. And sure, the LLM is more amazing than whatever code or post or book or lyric it was trained on. And sure, the LLM value comes from the sum of the parts of its source material instead of the value of any individual source. But fundamentally the LLM couldn't exist without the source material, and yet the LLM vendor is the one who gets to eat.
The balance between free and open exchange of ideas and paying value creators a portion of the value they create is not an easy question, and it's not anti-hacker to raise it. There are places where patents and other forms of exclusive rights seem to be criminally mismanaged, stifling progress. But there's also "some random person in Nebraska" who has produced billions of dollars in value and will never see a penny of it. Choosing progress alone as the goal will systematically deprive and ultimately drive away the very people whose contributions are enabling the progress. (And of course choosing "fair" repayment alone as the goal will shut down progress and allow less "fair" players to take over... that's why this isn't easy.)
Sounds like it was never about copyright as a principle, only symbolic politics (ie. copyrights benefit megacorps? copyright needs to be weaker! copyright hurts megacorps? copyright needs to be stronger!)
Actually in Germany it's GEMA
It's a good decision because it must be an incredible minority of people who only listen to music when the lyrics can be displayed. I'd imagine most people aren't even looking at the music playing app while listening to music. Regardless, they are copyrighted and they get license fees from parties that do license them and they make money that way. Likely much more money than they would make from the streams they are losing from you.
I think it depends on the music. Most people will have a greatly improved experience when listening to opera if they have access to (translated) lyrics. Even if you know the language of an opera, it can be extremely difficult for a lot of people to understand the lyrics due to all the ornamentation.
What percentage of streaming income does opera, as a genre, represent such that it could even factor into this business decision?
I think having the lyrics reproducible in text form isn't the problem. Many sites have been doing that for decades and as far as I know record companies haven't gone after them. But these days with generative AI, they can take lyrics and just make a new song with them, and you can probably see why artists and record companies would want to stop that.
Plus, from TFA,
"GEMA hoped discussions could now take place with OpenAI on how copyright holders can be remunerated."
Getting something back is better than nothing
I didn't downvote, but
> I think having the lyrics reproducible in text form isn't the problem. Many sites have been doing that for decades and as far as I know record companies haven't gone after them.
Reproducing lyrics in text form is, in fact, a problem, independent of AI. The music industry has historically been aggressively litigious in going after websites which post unlicensed song lyrics[0]. There are many arcane and bizarre copyright rules around lyrics. e.g. If you've ever watched a TV show with subtitles where there's a musical number but none of the lyrics are subtitled, you might think it was just laziness, but it's more likely the subtitlers didn't have permission to translate&subtitle the lyrics. And many songs on Spotify which you'd assume would have lyrics available, just don't, because they don't have the rights to publish them.
[0] https://www.billboard.com/music/music-news/nmpa-targets-unli...
Thanks. Maybe that misconception was the problem. Taking a hammering in downvotes, lol
Had a couple of drive-by downvotes... Is it that stupid an opinion? Granted I know nothing about the case except for what's in TFA
I'm not one of the downvoters, but it may be this: "Many sites have been doing that for decades and as far as I know record companies haven't gone after them."
Record companies have in fact, for decades, been going after sites for showing lyrics. If you play guitar, for example, it's almost impossible to find chords/tabs that include the lyrics because sites get shut down for doing that.
1 reply →
> Had a couple of drive-by downvotes... Is it that stupid an opinion?
While I do not agree with your take, FWIW I found your comment substantive and constructive.
You seem to be making two points that are both controversial:
The first is that generative AI makes the availability of lyrics more problematic, given new kinds of reuse and transformation it enables. The second is that AI companies owe something (legally or morally) to lyric rights holders, and that it is better to have some mechanism for compensation, even if the details are not ideal.
I personally do not believe that AI training is meaningfully different from traditional data analysis, which has long been accepted and rarely problematized.
While I understand that reproducing original lyrics raises copyright issues, this should only be a concern in terms of reproduction, not analysis. Example: Even if you do no data analysis at all and your random character generator publishes the lyrics of a famous Beatles song (or other forbidden numbers) by sheer coincidence, it would still be a copyright issue.
I also do not believe in selective compensation schemes driven by legal events. If a legitimate mechanism for rights holders cannot be constructed in general, it is poor policy craftsmanship to privilege the music industry specifically.
Doing so relieves the pressure to find a universal solution once powerful stakeholders are satisfied. While this might be seen as setting a useful precedent by small-scale creators, I doubt it will help them.
It's like saying that movie studios haven't gone after Netflix over movies, so what's the issue with hosting pirated movies on your own site. The reason movie studios don't go after Netflix is that they have a license to show it.
If anything, AI would scramble the lyrics more than a human "taking lyrics to make a new song from them".
1 reply →
Likely because you're a "luddite" which in the current atmosphere of HN and other tech spaces, mean you have a problem with a "research institution" which has a separate for-profit enterprise face that it wears when it feels like it having free and open access to the collected works of humanity so it can create a plagiarism machine that it can then charge for people to access.
I don't respect this opinion but it is unfortunately infesting tech spaces right now.