The obsession with protecting access to lyrics is one of the strangest long-running legal battles to me. I will skip tracks on Spotify sometimes specifically because there are no lyrics available. Easy access to lyrics is practically an advertisement for the music. Why do record companies not want lyrics freely available? In most cases, it means they aren't available at all. How is that a good business decision?
Thoughts by someone who doesn’t make a living by songs?
I’m guessing you’d want to restrict lyrics to encourage more plays of the song by people who are motivated to understand them. Along with the artist’s appreciation of that experience of extracting what you’re fascinated by. Burdensome processes generate love and connection to things.
Not everything is a functional commodity to be used and discarded at whim.
One amusing part of lyrics on Spotify to me is how they don't seem to track which songs are instrumentals or not and use that to skip the message about them not knowing the lyrics. An instrumental will pop up and it will say something like "Sorry, we don't have the lyrics to this one yet".
The content industries should have been the ones to invent LLMs, but their head is so stuck in the past and in regressive thinking about how they protect their revenue streams that they're incapable of innovating. Publishing houses should have been the ones to have researchers looking into how to computationally leverage their enormous corpus of data. But instead, they put zero dollars into actual research and development and paid the lawyers instead. And so it leads to attitudes like this.
The only people seeing themselves as "content creators" are people giving social media stuff so their users get something they can doom scroll. Other people see themselves as artists, entertainers, musicians, authors, etc.
> The content industries should have been the ones to invent LLMs
While exclusively-controlled LLMs would be mildly useful to them, the technology existing is dangerous to them, and they already have a surplus supply of content at low cost that they monetize by controlling discovery, gatekeeping, and promotion, so I don't think it makes sense for them to put energy into LLMs even if they had the technical acumen to recognize the possibilities (much the same way that Google, despite leading in developing the underlying technology, had vvery little incentive to productize it since it was disruptive to their established business, until someone else already did and the choice was to compete on that or lose entirely.)
I'm pretty sure you could even have lyrics with a separate copyright from the composition itself. For example, you can clearly have lyrics without the music and you can have the composition alone in the case that it is performed as an instrumental cover or something.
I know nuance takes the fun out of most online discussions, but there's a qualitative difference between a bunch of college kids downloading mp3's on a torrent site and a $500 billion company who's goal among other things is to become the primary access point to all things digital.
3) Some types of data are more ethical to train on than others.
Training on Wikipedia? Cool! Training on pirated copies of books? Not cool! Training on lyrics? IMO that's on the "cool" side of the line, because the "product" is not the words, it's the composition and mastered song.
Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs.
Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech.
LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.
It's a good decision because it must be an incredible minority of people who only listen to music when the lyrics can be displayed. I'd imagine most people aren't even looking at the music playing app while listening to music. Regardless, they are copyrighted and they get license fees from parties that do license them and they make money that way. Likely much more money than they would make from the streams they are losing from you.
I think it depends on the music. Most people will have a greatly improved experience when listening to opera if they have access to (translated) lyrics. Even if you know the language of an opera, it can be extremely difficult for a lot of people to understand the lyrics due to all the ornamentation.
I think having the lyrics reproducible in text form isn't the problem. Many sites have been doing that for decades and as far as I know record companies haven't gone after them. But these days with generative AI, they can take lyrics and just make a new song with them, and you can probably see why artists and record companies would want to stop that.
Plus, from TFA,
"GEMA hoped discussions could now take place with OpenAI on how copyright holders can be remunerated."
> I think having the lyrics reproducible in text form isn't the problem. Many sites have been doing that for decades and as far as I know record companies haven't gone after them.
Reproducing lyrics in text form is, in fact, a problem, independent of AI. The music industry has historically been aggressively litigious in going after websites which post unlicensed song lyrics[0]. There are many arcane and bizarre copyright rules around lyrics. e.g. If you've ever watched a TV show with subtitles where there's a musical number but none of the lyrics are subtitled, you might think it was just laziness, but it's more likely the subtitlers didn't have permission to translate&subtitle the lyrics. And many songs on Spotify which you'd assume would have lyrics available, just don't, because they don't have the rights to publish them.
Everyone knows that these LLMs were trained on copyrighted material, and as a next-token prediction model, LLMs are strongly inclined to reproduce text they were trained on.
All AI companies know they're breaking the law. They all have prompts effectively saying "Don't show that we broke the law!". That we continue to have tech companies consistently breaking the law and nothing happens is an indictment of our current economy.
And it's a question of do we accept breaking law for the possibility to have the greatest technological advancement of the 21st century. In my opinion, legal system has become a blocker for a lot of innovation, not only in AI but elsewhere as well.
I don’t read this as “don’t show we broke the law,” I read it as “don’t give the user the false impression that there’s any legal issue with this generated content.”
There’s nothing law breaking about quoting publicly available information. Google isn’t breaking the law when it displays previews of indexed content returned by the search algorithm, and that’s clearly the approach being taken here.
Training on copyright is not illegal. Even in the lawsuit against anthropic it was found to be fair use.
Pirating material is a violation of copyright, which some labs have done, but that has nothing to do with training AI and everything to do with piracy.
post trained models strongly inclined to pass response similar to what got them high RL score, it's slightly wrong to keep thinking of LLMs as just next token predictions from dataset's probability distribution like it's some Markov Chain
> Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.
Another glimpse into the "mind" of a tech corporation allowing itself full freedom to profit from the vast body of human work available online, while explicitly declining any societal responsibility at all. It's the user's fault, he wrote an illegal prompt! We're only providing the "technology"!
This is largely how it works for nearly all coprightable work. I can draw Mickey Mouse but legally I'm not doing anything wrong until I try to sell it. It certainly doesn't put Crayola or Adobe at legal risk for me to do so.
But you are not the one drawing Mickey Mouse in this scenario, are you? You are instructing the AI company to draw something or more close to the original post you are prompting to generate lyrics for song X.
Your prompt may be asking something for illegal (i.e. reproducing the lyrics), but the one reproducing the lyrics is the AI company, not you yourself.
In your example you are asking Adobe to draw Mickey Mouse and Adobe happily draws a perfect rendition of Mickey Mouse for you and you have to pay Adobe for that image.
Not really, if I ask an artist to draw me a Mickey Mouse (for money) who is committing copyright infringement?
It's an interesting observation that the big AI corps very much argue that learning "is the same that humans do", so fair use. But then when it comes to using that learning they argue the other way, i.e. "this is just a machine, it's the person asking who is doing the infringement".
While I partially understand (but not support) the hate against AI due to possible plagiarism and "low effort generation" of works, think about the whole process: If model providers will be liable for generating output, that resembles lyrics or very short texts that fall under copyright laws, they will just change their business model.
E.g. why offering lame chat agents as a service, when you can keep the value generation in-house. E.g. have a strategy board that identifies possible use cases for your model, then spin off a company that just does agentic coding, music generation. Just cut off the end users/public form the model access, and flood the market with AI generated apps/content/works yourself (or with selected partners). Then have a lawyer checking right before publishing.
So this court decision may turn everything worse? I don't know.
The fact they don't already do that, sounds to me like the things produced by AI are not worth the investment. Especially since the output is not copyrightable, right?
If there was a lot of gold to find they wouldn't sell the shovels.
There is a lot of value in specialization. It allows capitalism to do its magic to elevate the best uses of your technology without yourself taking on any of the risk. Trying to inhouse everything often smothers innovation and leads to bad resource allocation. It can be done, but in fields with a lot of ongoing innovation it's extremely hard to get right
There is a reason that Cisco doesn't offer websites, and you are probably actively ignoring whatever websites your ISP has. ASML isn't making chips, and TSMC isn't making chip designs
If there is such an immense value in spinning off and selling models separately you can bet that will happen - without court saying so. At the end running these models is a costly job and you'd want to squeeze out every value.
A media generation company that is forced to publish uncopyrightable works, because it cannot make the usage to these media generators public, since that would violate copyright - that does sound like a big win for everyone but that company.
Because that's the only business model that the management of these model provider companies suspect to have a chance of generating income, at the current state.
> While I partially understand (but not support) the hate against AI due to possible plagiarism
There's no *possible* plagiarism, every AI slop IS result of plagiarism.
> E.g. have a strategy board that identifies possible use cases for your model, then spin off a company that just does agentic coding, music generation.
Having lame chat agents as a service does not preclude them from doing this. The fact that they are only selling the shovels should be somewhat insightful.
I think in the end they will just pay off copyright holders. The German GEMA is mostly interested in rent-seeking through whatever means available, it's basically the whole point of the organization.
They'll easily be paid off once all legal avenues are exhausted for OpenAI. Though they'll of course keep fighting in court in the hopes of some more favorable negotiating position.
Another instance of GEMA fighting an american company. Anyone who was on the german internet in the first half of the last decade remembers the "not available in your country" error messages on youtube because Google didn't make a deal with GEMA.
I don't think that we will end up here with such a scenario: lyrics are pervasive and probably also quoted in a lot of other publications. Furthermore, it's not just about lyrics but one can make a similar argument about any published literary work. GEMA is for music but for literary publications there is VG Wort who in fact already have an AI license.
I rather think that OpenAI will license the works from GEMA instead. Ultimately this will be beneficial for the likes of OpenAI because it can serve as a means to keep out the small players. I'm sure that GEMA won't talk to the smaller startups in the field about licensing.
Is this good for the average musician/author? these organizations will probably distribute most of the money to the most popular ones, even though AI models benefit from quantity of content instead of popularity.
I am curious what happens if they call their bluff on this and cut off ChatGPT in Germany. Not that I think OpenAI is doing the right thing, just, I don’t think a country’s government can justify no commercial LLMs to its populace.
There are many competing providers of commercial LLMs with equal capabilities, so another vendor would probably be happy to serve a leading Western market of 83 million people.
There are 80 million Germans. If you where OpenAI, or it's shareholders, would you leave that market open for a competitor? No, you'd make a version of your product without the lyrics. More EU countries are going to follow and reach the same conclusion, especially now that Germany has set a legal precedence. Should OpenAI just pull out of a market with 500 million people and leave it to Claude, Perplexity or someone else entirely?
It doesn't appear that modern LLMs are really that hard to build, expensive perhaps, but if you have monopoly on a large enough market, price isn't really your main concern.
> More EU countries are going to follow and reach the same conclusion, especially now that Germany has set a legal precedence.
That's not how laws and regulations work in European or even EU countries. Courts/the legal system in Germany can not set legal precedents for other countries, and countries don't use legal precedents from other countries, as they obviously have different laws. It could be cited as an authority, but no one is obligated to follow that.
What could happen for example, would be that EU law is interpreted through the CJEU (Court of Justice of the European Union), and its rulings bind EU member states, but that's outside of what individual countries do.
Sidenote, I'm not a English native speaker, but I think it's "precedent", not "precedence", similar words but the first one is specifically what I think you meant.
This assumes that tech companies can act above the law because they've got a new feature to jam down our throats. Have you considered that not everyone wants that? Or that it might not be the best thing?
Conversely, last week we had Spain being willing to cut off Cloudflare (!) to protect football match royalties.
> I don’t think a country’s government can justify no commercial LLMs to its populace.
Counter-argument: can any country's government justify allowing its population and businesses to become completely dependent on an overseas company which does not comply with its laws? (For Americans, think "China" in this case)
I come from the country with the world’s oldest continuous parliament, and they change the law all the time. Arguably that’s all the majority of politicians do.
> I don’t think a country’s government can justify no commercial LLMs to its populace
They're not saying no LLMs, they're saying no LLMs using lyrics without a license. OpenAI simply need to pay for a license, or train an LLM without using lyrics.
But lyrics are just one example. Are you saying that training experiments must filter out all substrings from the training input that bear too close a resemblance to a substring of a copyrighted work?
This obviously applies to all copyrighted works. I could sue OpenAI when it reproduces my source code that I published on the Internet.
They already "filter" the code to prevent it from happening (reproducing exact works). My guess it is just superficially changing things around so it is harder to prove copyright violations.
I would think as a matter of practice AI companies would attempt to detect long strings that appeared frequently in their corpus and dedup them out. There isn’t any value in training over and over again on the same data, and the copyright danger of being able to exactly reproduce your training set is obvious. Perhaps they did it intentionally, using the ability to reproduce copyrighted material as a way to get customers early on, knowing they would have to pay a paltry fee for it later.
Of course the models are not human, but if you consider this situation as if they are persons, then the question becomes: May a person read lyrics and tell it to someone when asked, and the court's ruling basically says no, this may not happen, which makes little sense.
I guess the main difference between the situation with language models and humans is one of scale.
I think the question should be viewed like this, if I as a corporation do the same thing but just with humans, would it be legal or not. Given a hypothetical of hiring a bunch of people, having them read a bunch of lyrics, and then having them answer questions about lyrics. If no law prohibits the hypothetical with people, then I don't see why it should be prohibited with language models, and if it is prohibited with people, then there should be no specific AI ruling needed.
All this being said, Europe is rapidly becoming even more irrelevant than it was, living of the largess of the US and China, it's like some uncontacted tribe ruling that satellites can't take areal photos of them. It's all good and well, just irrelevant. I guess Germany can always go the route of North Korea if they want.
> "May a person read lyrics and tell it to someone when asked"
If you sell tickets to an event where you read the lyrics aloud, it's commercial performance and you need to pay the author. (Usually a cover artist would be singing, but that's not a requirement.)
So it's not like a human can recite the lyrics anywhere freely either.
If someone hires me as a secretary, and they ask me what is the lyrics of a song, there is no law that prohibits me from telling them if I know and I don't have to license the lyrics in order to do so.
If they hire me primarily to recite lyrics, then sure, that would probably be some manner of infringement if I don't license them. But I feel like the case with a language model is much more the former than the latter.
> May a person read lyrics and tell it to someone when asked, and the court's ruling basically says no, this may not happen, which makes little sense.
I think the difference here is that your example is what a search engine might do, whereas AI is taking the lyrics, using them to create new lyrics, and then passing them off as its own.
> whereas AI is taking the lyrics, using them to create new lyrics, and then passing them off as its own.
Is this not something every single creative person ever has done? Is this not what creating is? We take in the world, and then create something based on that.
A key takeaway from this ruling is that "the systems contain copies of the original works." Does this mean that offering any open-weight model capable of reproducing copyrighted text snippets or lyrics will be prohibited?
That would be a big setback for AI development in the EU.
That's what the lawsuit of the New York Times is about - OpenAI reproducing complete texts of NYT articles without paying for the reproduction of said articles. This is not an EU issue, but a general unsolved legal grey zone for the whole AI market.
Can't they just ask for copies of the lyrics they are not allowed to use and s/lyrics//g the training set? I imagine the volume of text that will be removed would be relatively miniscule.
I made a living from GEMA payments some while back, but dear lord, so much of how the institution does what it does feels so bad and zero-sum. Might just be that the world would be better off without it. It does something important for right holders for sure, but (and I understand, I am heavily back-seating here without offering a solution) there must be better ways to go about it.
Now, without the fimförderung all those grim dark arthouse movies where people yell "Scheisse!" in Berlin stairwells would never be made. And all that public gremium pleasing shovelware, looking extracute and boring clogging up the appstores with zero sales, what would we do without that.
Take anything popular streamingwise and ask yourself would it get through and by. And if it was stopped by what and who.. fire that, to fix germanys media sector.
Nah. It’s so easy for OpenAI to modify their output. I’m already seeing them restrict news article re-generation by newspaper name. They do it to reduce liability. There’s also a big copyright infringement case coming up in the USA this year, and being able to point to responsiveness to complaints will be a key part of their legal defense I bet.
You can modify the output but the underlying model is always susceptible to jail breaks.
A method I tried a couple months ago to reliably get it to explain to me how to cook meth step by step still works. I’m not gonna share it, you just have to take my word on this.
I m not sure about the problem here, lyrics are public you can search '$songname lyrics' and get the result in a website (or even at the search engine page). What's the issue with an LLM producing those lyrics if you ask?
Long ago the first site I remember to do this was lyrics.ch, which was long since shut down by litigation. I'm not endorsing the status quo here, but if the licensing system exists it is obviously unfair to exempt parties from it simply because they're too big to comply.
Yeah it is tricky in the current climate who to say “fuck you” to first. GEMA does at least represent human artists a bit. Nobody I know in music or any other creative industries has given a blanket allowance for AI companies hoovering up their artworks to then regurgitate for profit. Pirating music to hear it is one thing. Cloning it with modifications to resell is a whole other thing.
Copyright law continues to stifle innovation. The DMCA needs to be abolished, and we need to entirely rethink our modern economic system with respect to creative industries (including software development). The cat is not going back in the bag.
We are sitting on the precipice of the greatest technological advancement in history, and rent-seeking industry titans have convinced us that we must stop this unstoppable technological advancement in order to protect the livelihood of artists who already receive cents on the dollar for their efforts.
Doesn't that sound familiar? This is what they have done time and again, and each time they have lost, leading to a huge loss of potential revenue for creatives as people make use of technological breakthroughs.
Then when companies like Netflix finally get everyone on board with streaming and paying for content with modern conveniences, industry titans step back in to demand larger slices of the pie, until the entire system is ruined and people return to piracy and consumption of older media. Don't even get me started on Spotify.
The technological benefit of modern machine learning models is just too large to ignore. These are becoming important tools, which put power back in the hands of the people, of the consumer. A lot of the grassroots anti-AI movements we see in the creative space can be traced back to corporate propaganda or financial backing. A lot of these people really think they're doing what's best for artists. But I just see Blockbuster all over again. We should make an effort not to be on the wrong side of history.
Ah yes, give away the protection that also protects me, the small person, should I write a song, write a book, come up with a compelling software concept, come up with a way to improve food growth, should go away because it's 'rent seeking' in order to be replaced with.... rent seeking trillion dollar valuation tech companies?
Prior to the current tech bros economy one of the number one ways average people moved up to being rich in the USA was all enabled purely by the copyright/patents laws protections you want to do away with.
I said nothing about patents (though I definitely have feelings about software patents you probably wouldn't like), I simply stated that it is farcical and dishonest to build an economic system predicated upon the restriction of first-amendment rights of consumers, such as the DMCA which prevents me from making copies of my files and sharing them with other people, which, outside of national security threats, is an ethically bankrupt proposition.
The reality is that these small people of which you speak are still beholden to a rent-seeking industry that exploits artists en masse. Most creatives historically got next to nothing. Most artists don't get anything close to rich off of their work.
Yet today, we are able to directly support creatives, and many creatives do quite well by managing the long tail and curating a small, but dedicated patronage. Many of these creatives make more money with such a model, while still allowing their work to be shared with proper attribution.
We have just been brainwashed by a century of corporate interests sticking their hand into every facet of the creative industry, convincing us that the systems they've built over decades are the only way for things to be, even if it infringes upon the rights of others.
Also, while I develop software as a trade, I have also been an artist my entire life, working across many mediums, and so my opinions about the intersection of creativity and technology are not just those of some "tech bro", and I don't think that kind of framing is productive or fair. Especially considering I grew up poor and homeless as a teenager, and have had to reckon with the economic prospects of which you speak much more closely than most.
Lyrics produced some of the first AI slop I noticed after ChatGPT was launched in late 2022, even if the large models hadn’t been trained on them specifically. Overnight there were a bunch of different advertising-laden sites that clearly scraped Genius or other lyric websites, and then had GPT generate commentaries on what the lyrics supposedly mean, so that these would get picked up by search engines.
The result was mostly comical, the commentaries for vacuous pop music all sounded more or less the same: “‘Shake Your Booty’ by KC and the Sunshine Band expresses the importance of letting one’s hair down and letting loose. The song communicates to listeners how liberating it is to gyrate one’s posterior and dance.” Definitely one of the first signs that this new tech was not going to be good for the web.
With AI slop showing up everywhere, there’s a real danger that folks will just no longer be motivated to produce real original content.
With all major models not basically trained on nearly all available data, beyond the financial AI bubble about to burst there’s also a big content bubble that’s about exhausted as folks are just pumping out slop vs producing original creative human output. That may be the ultimate long term tragedy of the present AI hype cycle. Expect “made by a human” to soon be a tag associated with premium brands and customer experiences.
> Expect “made by a human” to soon be a tag associated with premium brands and customer experiences.
I went to a grammar school and I write in mostly pretty high-quality sentences with a bit of British English colloquialism. I spell well, spend time thinking about what I am saying and try to speak clearly, etc.
I've always tried to be kind about people making errors but I am currently retraining my mind to see spelling mistakes and grammar errors as inherent authenticity. Because one thing ChatGPT and its ilk cannot do -- I guess architecturally —- is act convincingly like those who misspell, accidentally coin new eggcorns, accidentally use malapropisms, or use novel but terrible grammar.
And you're right: IMO the rage against the cultural damage AI will do is only just beginning, and I don't think people have clocked on to the fact that economic havoc is built-in, success or failure.
The web/AI/software-tech industry will be loathed even more than it is now (and this loathing is increasingly justified)
> one thing ChatGPT and its ilk cannot do -- I guess architecturally —- is act convincingly like those who misspell, accidentally coin new eggcorns, accidentally use malapropisms, or use novel but terrible grammar
Just wait a few more years until the majority of ChatGPT training data is filled with misspellings, accidental eggcorns, malapropisms and terrible grammar.
It is of no cost to me when someone else writes a book, plays a song or draws a picture. It is also true that, basically whatever I ever do, someone else has done better. This does not stop me from doing those things because the value within them is in doing them.
We have cars, buses and planes, yet people do partake in pilgrimages. The process matters, even if only personally.
> folks will just no longer be motivated to produce real original content.
Honestly if your only motivation for creating art was “computers can’t do what I do” then… I don’t want to be too gatekeepy about it, but that doesn’t sound like you’re a ‘real’ artist to me. Real artists create art because they enjoy doing it, not because it’s the exclusive domain of humans.
You don’t need to be special, you don’t need to be the best, you don’t need to even be good or successful or recognized or appreciated (although of course all those things are nice) - you just have to be creating art.
AI slop is like 90’s websites and desktop publishing - there’s a novelty for AI-newbie-creators driving them to churn out lazy crap, while being oblivious to how it lands with strangers.
Tastes will mature, society will more vocally mock this crap, and we’ll stop seeing the sloppier stuff come out of reputable locations.
You assume that the public recognizes AI slop for what it is. Across platforms now, people are readily engaging with blatant AI text posts and generated images as if they are bona-fide. In fact, if you point out that the poster is a bot, you may well well get some flack from the community.
Plastic/synthetics are the slop of the physical world. They're a side product of extracting oil and gas so they're extremely cheap.
Yet if you look at synthetics by volume, probably 99% of them are used just because they're cheaper than the natural alternative. Yes, some have characteristics that are novel, but by and large everything we do with plastics is ultimately based on "they're cheaper".
> With AI slop showing up everywhere, there’s a real danger that folks will just no longer be motivated to produce real original content.
I think people would still produce original things as long they have the means for doing it. I guess we could say it is our nature. My fear is AI monopolizing the wealth that once would go to support people producing art.
This. I still produce original things and will continue to do so until I am incapable anymore. What's changed, though, is that I no longer put or discuss those things on the open internet because there's no realistic way to prevent it from getting used to train genAI models.
Other countries are currently going through the same. KODA is running a similar lawsuit on behalf of the Danish musicians, they can now point to Germany as an example, making it much easier for them to win.
These people would stream German schlager to every screen and speaker in Europe and charge for it 100 EUR monthly per breathing person, if they could. They are violent.
However, the lyrics are shown because the user requested them, shouldn't be the user be liable instead? The same way social networks are not liable for content uploaded by users? I think here there is a somewhat double standard.
Of course, maybe OpenAI et al should have get a license before training on the lyrics or to avoid training on copyrighted content. But the first would be expensive and the latter would require them to develop actual intelligence.
Why should the user be liable? They didn't reproduce the copyrighted work and the machine is totally capable of denying output (like it already does for other categories of material).
At the very least, the users being liable instead of OpenAI makes no sense. Like arresting only drug users and not dealers.
There are countries where drug consumption/posesion is penalized too. There is a similar example in other area: For instance, in Sweeden, Norway and Belize selling sex (aka prostitution) is legal, but buying it is not legal. So, your example actually exists in world legislation.
I'm just asking where are we going to put the line and why.
> However, the lyrics are shown because an action is the user so, shouldn't be the user be liable instead?
Same goes for websites where you can watch piracy streams. "The action is the user pressing play" sounds like it might win you an internet argument, but I'm 99% sure none of the courts will play those games, you as the operator who enabled whatever the user could do ends up liable.
I think that is completely different. Piracy websites do only one thing. Chatbots are different.
My concern is that where are we going to put the line: If I type a copyrighted song in Word is Microsoft liable? If I upload a lyric to ChatGPT and ask it to analyze or translate it, is it a copyright violation?
I totally understand your line of thinking. However, the one I'm suggesting could be applied as well and it has precedents in law (intellectual authors of crimes are punishable, not only the perpetrators).
If that was case then Google wouldn't receive DMCA takedown of piracy links, instead offer up users searching for piracy content. Former is more prevalent than latter because
one, it requires invasion of privacy - you have to serve up everyone's search results
two, it requires understanding of intent.
Same is the issue here. OpenAI then needs to share all chats for courts to shift through and second, how to judge intent. If someone asks for a German pop song and OpenAI decides to output Bochum - whose fault is that?
Member when music sites were suing YouTube for music videos, and now they are begging people to watch them there and YT view counts are a bragging topic?
Soon music industry will be begging OpenAI for exposure of their content, just like the media industry is begging Google for scraping.
The obsession with protecting access to lyrics is one of the strangest long-running legal battles to me. I will skip tracks on Spotify sometimes specifically because there are no lyrics available. Easy access to lyrics is practically an advertisement for the music. Why do record companies not want lyrics freely available? In most cases, it means they aren't available at all. How is that a good business decision?
They probably fear a domino effect if they let go of this. And so they defend it vehemently to avoid setting a precedent.
Think about compositions, samples, performance rights, and so on. There is a lot more at stake.
What's the benefit of protecting monetary IP rights to art?
We'll only get the art that artists really wanted to make? Great!
4 replies →
Hot take: it’s all bullshit.
Like software patents - when you’re not a normie.
Thoughts by someone who doesn’t make a living by songs?
I’m guessing you’d want to restrict lyrics to encourage more plays of the song by people who are motivated to understand them. Along with the artist’s appreciation of that experience of extracting what you’re fascinated by. Burdensome processes generate love and connection to things.
Not everything is a functional commodity to be used and discarded at whim.
One amusing part of lyrics on Spotify to me is how they don't seem to track which songs are instrumentals or not and use that to skip the message about them not knowing the lyrics. An instrumental will pop up and it will say something like "Sorry, we don't have the lyrics to this one yet".
The only thing funnier than that is when they do have the lyrics to a song that probably doesn't need them, like Hocus Pocus by Focus: https://open.spotify.com/track/2uzyiRdvfNI5WxUiItv1y9?si=7a7...
Oh they track that, it's in their API as the "instrumentalness" score: https://developer.spotify.com/documentation/web-api/referenc...
The fact that they don't do anything with that information is unrelated.
1 reply →
I’ve also seen cases where they list lyrics for a song that doesn’t have any (usually an instrumental jazz version of an old standard).
The content industries should have been the ones to invent LLMs, but their head is so stuck in the past and in regressive thinking about how they protect their revenue streams that they're incapable of innovating. Publishing houses should have been the ones to have researchers looking into how to computationally leverage their enormous corpus of data. But instead, they put zero dollars into actual research and development and paid the lawyers instead. And so it leads to attitudes like this.
The only people seeing themselves as "content creators" are people giving social media stuff so their users get something they can doom scroll. Other people see themselves as artists, entertainers, musicians, authors, etc.
1 reply →
“The content industries.”
Why would people invest in destroying what they love?
2 replies →
> The content industries should have been the ones to invent LLMs
While exclusively-controlled LLMs would be mildly useful to them, the technology existing is dangerous to them, and they already have a surplus supply of content at low cost that they monetize by controlling discovery, gatekeeping, and promotion, so I don't think it makes sense for them to put energy into LLMs even if they had the technical acumen to recognize the possibilities (much the same way that Google, despite leading in developing the underlying technology, had vvery little incentive to productize it since it was disruptive to their established business, until someone else already did and the choice was to compete on that or lose entirely.)
1 reply →
That's always been the case, eg. how they were latecomers to streaming.
1 reply →
The composition and lyrics are owned separately from the recorded performance.
I'm pretty sure you could even have lyrics with a separate copyright from the composition itself. For example, you can clearly have lyrics without the music and you can have the composition alone in the case that it is performed as an instrumental cover or something.
This is a tough one for the HN crowd. It's like that man not sure which button to push meme.
1) RIAA is evil for enforcing copyrights on lyrics?
2) OpenAI is evil for training on lyrics?
I know nuance takes the fun out of most online discussions, but there's a qualitative difference between a bunch of college kids downloading mp3's on a torrent site and a $500 billion company who's goal among other things is to become the primary access point to all things digital.
9 replies →
Why not both? As the GP mentioned, lyrics are also invaluable for people besides training for AI.
1 reply →
I think you mean the RIAA
RAII is a different kind of (necessary) evil
1 reply →
3) Some types of data are more ethical to train on than others.
Training on Wikipedia? Cool! Training on pirated copies of books? Not cool! Training on lyrics? IMO that's on the "cool" side of the line, because the "product" is not the words, it's the composition and mastered song.
Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs.
Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech.
LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.
5 replies →
Actually in Germany it's GEMA
It's a good decision because it must be an incredible minority of people who only listen to music when the lyrics can be displayed. I'd imagine most people aren't even looking at the music playing app while listening to music. Regardless, they are copyrighted and they get license fees from parties that do license them and they make money that way. Likely much more money than they would make from the streams they are losing from you.
I think it depends on the music. Most people will have a greatly improved experience when listening to opera if they have access to (translated) lyrics. Even if you know the language of an opera, it can be extremely difficult for a lot of people to understand the lyrics due to all the ornamentation.
1 reply →
I think having the lyrics reproducible in text form isn't the problem. Many sites have been doing that for decades and as far as I know record companies haven't gone after them. But these days with generative AI, they can take lyrics and just make a new song with them, and you can probably see why artists and record companies would want to stop that.
Plus, from TFA,
"GEMA hoped discussions could now take place with OpenAI on how copyright holders can be remunerated."
Getting something back is better than nothing
I didn't downvote, but
> I think having the lyrics reproducible in text form isn't the problem. Many sites have been doing that for decades and as far as I know record companies haven't gone after them.
Reproducing lyrics in text form is, in fact, a problem, independent of AI. The music industry has historically been aggressively litigious in going after websites which post unlicensed song lyrics[0]. There are many arcane and bizarre copyright rules around lyrics. e.g. If you've ever watched a TV show with subtitles where there's a musical number but none of the lyrics are subtitled, you might think it was just laziness, but it's more likely the subtitlers didn't have permission to translate&subtitle the lyrics. And many songs on Spotify which you'd assume would have lyrics available, just don't, because they don't have the rights to publish them.
[0] https://www.billboard.com/music/music-news/nmpa-targets-unli...
1 reply →
Had a couple of drive-by downvotes... Is it that stupid an opinion? Granted I know nothing about the case except for what's in TFA
7 replies →
Simon Willison had an analysis of Claude's system prompt back in May. One of the things that stood out was the effort they put in to avoiding copyright infringement: https://simonwillison.net/2025/May/25/claude-4-system-prompt...
Everyone knows that these LLMs were trained on copyrighted material, and as a next-token prediction model, LLMs are strongly inclined to reproduce text they were trained on.
All AI companies know they're breaking the law. They all have prompts effectively saying "Don't show that we broke the law!". That we continue to have tech companies consistently breaking the law and nothing happens is an indictment of our current economy.
And it's a question of do we accept breaking law for the possibility to have the greatest technological advancement of the 21st century. In my opinion, legal system has become a blocker for a lot of innovation, not only in AI but elsewhere as well.
21 replies →
The whole industry is based on breaking the law. You don’t get to be Microsoft, Google, Amazon, meta, etc without large amounts of illegality.
And the VC ecosystem and valuations are built around this assumption.
I don’t read this as “don’t show we broke the law,” I read it as “don’t give the user the false impression that there’s any legal issue with this generated content.”
There’s nothing law breaking about quoting publicly available information. Google isn’t breaking the law when it displays previews of indexed content returned by the search algorithm, and that’s clearly the approach being taken here.
1 reply →
This is incorrect. Two judges have now ruled that training on copyrighted data is fair use. https://www.whitecase.com/insight-alert/two-california-distr...
Training on copyright is not illegal. Even in the lawsuit against anthropic it was found to be fair use.
Pirating material is a violation of copyright, which some labs have done, but that has nothing to do with training AI and everything to do with piracy.
6 replies →
You can always vote, but there is always someone going through the back door paying politicians and judges.
and training on mountains of open source code with no attribution is exactly the same
the code models should also be banned, and all output they've generated subject to copyright infringement lawsuits
the sloppers (OpenAI, etc) may get away with it in the US, but the developed world has far more stringent copyright laws
and the countries that have massive industries based on copyright aren't about to let them evaporate for the benefit of a handful of US tech-bros
2 replies →
post trained models strongly inclined to pass response similar to what got them high RL score, it's slightly wrong to keep thinking of LLMs as just next token predictions from dataset's probability distribution like it's some Markov Chain
I found this bit very revealing:
> Since the output would only be generated as a result of user inputs known as prompts, it was not the defendants, but the respective user who would be liable for it, OpenAI had argued.
Another glimpse into the "mind" of a tech corporation allowing itself full freedom to profit from the vast body of human work available online, while explicitly declining any societal responsibility at all. It's the user's fault, he wrote an illegal prompt! We're only providing the "technology"!
This is largely how it works for nearly all coprightable work. I can draw Mickey Mouse but legally I'm not doing anything wrong until I try to sell it. It certainly doesn't put Crayola or Adobe at legal risk for me to do so.
But you are not the one drawing Mickey Mouse in this scenario, are you? You are instructing the AI company to draw something or more close to the original post you are prompting to generate lyrics for song X.
Your prompt may be asking something for illegal (i.e. reproducing the lyrics), but the one reproducing the lyrics is the AI company, not you yourself.
In your example you are asking Adobe to draw Mickey Mouse and Adobe happily draws a perfect rendition of Mickey Mouse for you and you have to pay Adobe for that image.
2 replies →
Not really, if I ask an artist to draw me a Mickey Mouse (for money) who is committing copyright infringement?
It's an interesting observation that the big AI corps very much argue that learning "is the same that humans do", so fair use. But then when it comes to using that learning they argue the other way, i.e. "this is just a machine, it's the person asking who is doing the infringement".
1 reply →
While I partially understand (but not support) the hate against AI due to possible plagiarism and "low effort generation" of works, think about the whole process: If model providers will be liable for generating output, that resembles lyrics or very short texts that fall under copyright laws, they will just change their business model.
E.g. why offering lame chat agents as a service, when you can keep the value generation in-house. E.g. have a strategy board that identifies possible use cases for your model, then spin off a company that just does agentic coding, music generation. Just cut off the end users/public form the model access, and flood the market with AI generated apps/content/works yourself (or with selected partners). Then have a lawyer checking right before publishing.
So this court decision may turn everything worse? I don't know.
The fact they don't already do that, sounds to me like the things produced by AI are not worth the investment. Especially since the output is not copyrightable, right?
If there was a lot of gold to find they wouldn't sell the shovels.
There is a lot of value in specialization. It allows capitalism to do its magic to elevate the best uses of your technology without yourself taking on any of the risk. Trying to inhouse everything often smothers innovation and leads to bad resource allocation. It can be done, but in fields with a lot of ongoing innovation it's extremely hard to get right
There is a reason that Cisco doesn't offer websites, and you are probably actively ignoring whatever websites your ISP has. ASML isn't making chips, and TSMC isn't making chip designs
1 reply →
If there is such an immense value in spinning off and selling models separately you can bet that will happen - without court saying so. At the end running these models is a costly job and you'd want to squeeze out every value.
> Then have a lawyer checking right before publishing.
Your cheap app just got really expensive
> turn everything worse?
A media generation company that is forced to publish uncopyrightable works, because it cannot make the usage to these media generators public, since that would violate copyright - that does sound like a big win for everyone but that company.
How is that worse?
„Record companies“ without artists, but exclusive access to automated creation, selection and a working distribution.
1 reply →
> why offering lame chat agents as a service
Because that's the only business model that the management of these model provider companies suspect to have a chance of generating income, at the current state.
> While I partially understand (but not support) the hate against AI due to possible plagiarism
There's no *possible* plagiarism, every AI slop IS result of plagiarism.
> E.g. have a strategy board that identifies possible use cases for your model, then spin off a company that just does agentic coding, music generation.
Having lame chat agents as a service does not preclude them from doing this. The fact that they are only selling the shovels should be somewhat insightful.
This sounds like a much more niche product that doesn't justify the over half-trillion dollars invested into it so far.
For AI to have a positive ROI, it has to be highly applicable to basically every industry, and has to be highly available.
I think in the end they will just pay off copyright holders. The German GEMA is mostly interested in rent-seeking through whatever means available, it's basically the whole point of the organization.
They'll easily be paid off once all legal avenues are exhausted for OpenAI. Though they'll of course keep fighting in court in the hopes of some more favorable negotiating position.
If the copyright costs get too high then we'll just use Chinese AI, unless they try to ban that, too.
You know, I'm a bit of a lyricist myself. These very words are lyrics to a tune in my head, and thus enjoy the increased legal protection of lyrics.
Another instance of GEMA fighting an american company. Anyone who was on the german internet in the first half of the last decade remembers the "not available in your country" error messages on youtube because Google didn't make a deal with GEMA.
I don't think that we will end up here with such a scenario: lyrics are pervasive and probably also quoted in a lot of other publications. Furthermore, it's not just about lyrics but one can make a similar argument about any published literary work. GEMA is for music but for literary publications there is VG Wort who in fact already have an AI license.
I rather think that OpenAI will license the works from GEMA instead. Ultimately this will be beneficial for the likes of OpenAI because it can serve as a means to keep out the small players. I'm sure that GEMA won't talk to the smaller startups in the field about licensing.
Is this good for the average musician/author? these organizations will probably distribute most of the money to the most popular ones, even though AI models benefit from quantity of content instead of popularity.
https://www.vgwort.de/veroeffentlichungen/aenderung-der-wahr...
I am curious what happens if they call their bluff on this and cut off ChatGPT in Germany. Not that I think OpenAI is doing the right thing, just, I don’t think a country’s government can justify no commercial LLMs to its populace.
There are many competing providers of commercial LLMs with equal capabilities, so another vendor would probably be happy to serve a leading Western market of 83 million people.
Yeah? Which commercial provider’s model do you think was trained without using lyrics?
14 replies →
There are 80 million Germans. If you where OpenAI, or it's shareholders, would you leave that market open for a competitor? No, you'd make a version of your product without the lyrics. More EU countries are going to follow and reach the same conclusion, especially now that Germany has set a legal precedence. Should OpenAI just pull out of a market with 500 million people and leave it to Claude, Perplexity or someone else entirely?
It doesn't appear that modern LLMs are really that hard to build, expensive perhaps, but if you have monopoly on a large enough market, price isn't really your main concern.
> More EU countries are going to follow and reach the same conclusion, especially now that Germany has set a legal precedence.
That's not how laws and regulations work in European or even EU countries. Courts/the legal system in Germany can not set legal precedents for other countries, and countries don't use legal precedents from other countries, as they obviously have different laws. It could be cited as an authority, but no one is obligated to follow that.
What could happen for example, would be that EU law is interpreted through the CJEU (Court of Justice of the European Union), and its rulings bind EU member states, but that's outside of what individual countries do.
Sidenote, I'm not a English native speaker, but I think it's "precedent", not "precedence", similar words but the first one is specifically what I think you meant.
7 replies →
This assumes that tech companies can act above the law because they've got a new feature to jam down our throats. Have you considered that not everyone wants that? Or that it might not be the best thing?
> Have you considered that not everyone wants that? Or that it might not be the best thing?
Did I suggest either of those things?
Conversely, last week we had Spain being willing to cut off Cloudflare (!) to protect football match royalties.
> I don’t think a country’s government can justify no commercial LLMs to its populace.
Counter-argument: can any country's government justify allowing its population and businesses to become completely dependent on an overseas company which does not comply with its laws? (For Americans, think "China" in this case)
In curious why you think the rule of law is a bluff.
I come from the country with the world’s oldest continuous parliament, and they change the law all the time. Arguably that’s all the majority of politicians do.
Probably pattern recognition
German student performance will plateau, while all other countries slowly decline.
AI is actively harming kids' abilities while inflating their grades when they make AI do their homework.
German student performance may plateau, but when student performance in other countries falls, that still leaves them in a better place.
> cut off ChatGPT in Germany
God I can only hope
first due to how the EU unified marked works they would have to cut it from all of the EU not just Germany
second it probably would be good for the EU and even US as it would de-monopolize the market a bit before that becomes fully impossible
Claude and Gemini would become more popular.
> I don’t think a country’s government can justify no commercial LLMs to its populace
They're not saying no LLMs, they're saying no LLMs using lyrics without a license. OpenAI simply need to pay for a license, or train an LLM without using lyrics.
But lyrics are just one example. Are you saying that training experiments must filter out all substrings from the training input that bear too close a resemblance to a substring of a copyrighted work?
2 replies →
Oi, you got a loisense to read those words and then repeat them back to me when asked?
2 replies →
This obviously applies to all copyrighted works. I could sue OpenAI when it reproduces my source code that I published on the Internet.
They already "filter" the code to prevent it from happening (reproducing exact works). My guess it is just superficially changing things around so it is harder to prove copyright violations.
[dead]
I would think as a matter of practice AI companies would attempt to detect long strings that appeared frequently in their corpus and dedup them out. There isn’t any value in training over and over again on the same data, and the copyright danger of being able to exactly reproduce your training set is obvious. Perhaps they did it intentionally, using the ability to reproduce copyrighted material as a way to get customers early on, knowing they would have to pay a paltry fee for it later.
Of course the models are not human, but if you consider this situation as if they are persons, then the question becomes: May a person read lyrics and tell it to someone when asked, and the court's ruling basically says no, this may not happen, which makes little sense.
I guess the main difference between the situation with language models and humans is one of scale.
I think the question should be viewed like this, if I as a corporation do the same thing but just with humans, would it be legal or not. Given a hypothetical of hiring a bunch of people, having them read a bunch of lyrics, and then having them answer questions about lyrics. If no law prohibits the hypothetical with people, then I don't see why it should be prohibited with language models, and if it is prohibited with people, then there should be no specific AI ruling needed.
All this being said, Europe is rapidly becoming even more irrelevant than it was, living of the largess of the US and China, it's like some uncontacted tribe ruling that satellites can't take areal photos of them. It's all good and well, just irrelevant. I guess Germany can always go the route of North Korea if they want.
> "May a person read lyrics and tell it to someone when asked"
If you sell tickets to an event where you read the lyrics aloud, it's commercial performance and you need to pay the author. (Usually a cover artist would be singing, but that's not a requirement.)
So it's not like a human can recite the lyrics anywhere freely either.
You don't even have to sell tickets: if it's a free concert, copyright is likely infringed. This is likely true in all jurisdictions.
[dead]
If someone hires me as a secretary, and they ask me what is the lyrics of a song, there is no law that prohibits me from telling them if I know and I don't have to license the lyrics in order to do so.
If they hire me primarily to recite lyrics, then sure, that would probably be some manner of infringement if I don't license them. But I feel like the case with a language model is much more the former than the latter.
2 replies →
> May a person read lyrics and tell it to someone when asked, and the court's ruling basically says no, this may not happen, which makes little sense.
I think the difference here is that your example is what a search engine might do, whereas AI is taking the lyrics, using them to create new lyrics, and then passing them off as its own.
> whereas AI is taking the lyrics, using them to create new lyrics, and then passing them off as its own.
Is this not something every single creative person ever has done? Is this not what creating is? We take in the world, and then create something based on that.
A key takeaway from this ruling is that "the systems contain copies of the original works." Does this mean that offering any open-weight model capable of reproducing copyrighted text snippets or lyrics will be prohibited? That would be a big setback for AI development in the EU.
That's what the lawsuit of the New York Times is about - OpenAI reproducing complete texts of NYT articles without paying for the reproduction of said articles. This is not an EU issue, but a general unsolved legal grey zone for the whole AI market.
Can't they just ask for copies of the lyrics they are not allowed to use and s/lyrics//g the training set? I imagine the volume of text that will be removed would be relatively miniscule.
They should ask for lyrics they are allowed to use. The volume of the text that's left would be miniscule.
That's not a solution for the same reason I'm not allowed to pirate unless movie studios personally ask me not to do so.
It would be so hilarious if GEMA was actually useful for once and not a detriment to society and artists in general.
However of course OpenAI will ignore this and at worst nothing will change and at best they get a slap on the wrist and a fine and continue scraping.
You can’t take that stuff out of the models at this point anyway.
I made a living from GEMA payments some while back, but dear lord, so much of how the institution does what it does feels so bad and zero-sum. Might just be that the world would be better off without it. It does something important for right holders for sure, but (and I understand, I am heavily back-seating here without offering a solution) there must be better ways to go about it.
Now, without the fimförderung all those grim dark arthouse movies where people yell "Scheisse!" in Berlin stairwells would never be made. And all that public gremium pleasing shovelware, looking extracute and boring clogging up the appstores with zero sales, what would we do without that. Take anything popular streamingwise and ask yourself would it get through and by. And if it was stopped by what and who.. fire that, to fix germanys media sector.
Nah. It’s so easy for OpenAI to modify their output. I’m already seeing them restrict news article re-generation by newspaper name. They do it to reduce liability. There’s also a big copyright infringement case coming up in the USA this year, and being able to point to responsiveness to complaints will be a key part of their legal defense I bet.
You can modify the output but the underlying model is always susceptible to jail breaks. A method I tried a couple months ago to reliably get it to explain to me how to cook meth step by step still works. I’m not gonna share it, you just have to take my word on this.
3 replies →
It'd be equally hilarious if that VC money would be used to actually better society by crushing GEMA in court.
But realistically, all that will happen is that the "Pauschalabgabe" is extended to AI subscriptions, making stuff more expensive for everyone.
Damn I didn’t even consider the second part…
I m not sure about the problem here, lyrics are public you can search '$songname lyrics' and get the result in a website (or even at the search engine page). What's the issue with an LLM producing those lyrics if you ask?
They aren't! They're subject to licensing!
https://www.digitaltrends.com/social-media/rap-genius-deserv... (2013)
Long ago the first site I remember to do this was lyrics.ch, which was long since shut down by litigation. I'm not endorsing the status quo here, but if the licensing system exists it is obviously unfair to exempt parties from it simply because they're too big to comply.
Just because you can find them freely online doesn't make them public in the legal sense. If that was the case music piracy would also be legal.
I am torn because on one hand, fuck record companies. On the other hand, fuck AI companies torrenting, stealing and defrauding.
Yeah it is tricky in the current climate who to say “fuck you” to first. GEMA does at least represent human artists a bit. Nobody I know in music or any other creative industries has given a blanket allowance for AI companies hoovering up their artworks to then regurgitate for profit. Pirating music to hear it is one thing. Cloning it with modifications to resell is a whole other thing.
Copyright law continues to stifle innovation. The DMCA needs to be abolished, and we need to entirely rethink our modern economic system with respect to creative industries (including software development). The cat is not going back in the bag.
We are sitting on the precipice of the greatest technological advancement in history, and rent-seeking industry titans have convinced us that we must stop this unstoppable technological advancement in order to protect the livelihood of artists who already receive cents on the dollar for their efforts.
Doesn't that sound familiar? This is what they have done time and again, and each time they have lost, leading to a huge loss of potential revenue for creatives as people make use of technological breakthroughs.
Then when companies like Netflix finally get everyone on board with streaming and paying for content with modern conveniences, industry titans step back in to demand larger slices of the pie, until the entire system is ruined and people return to piracy and consumption of older media. Don't even get me started on Spotify.
The technological benefit of modern machine learning models is just too large to ignore. These are becoming important tools, which put power back in the hands of the people, of the consumer. A lot of the grassroots anti-AI movements we see in the creative space can be traced back to corporate propaganda or financial backing. A lot of these people really think they're doing what's best for artists. But I just see Blockbuster all over again. We should make an effort not to be on the wrong side of history.
Ah yes, give away the protection that also protects me, the small person, should I write a song, write a book, come up with a compelling software concept, come up with a way to improve food growth, should go away because it's 'rent seeking' in order to be replaced with.... rent seeking trillion dollar valuation tech companies?
Prior to the current tech bros economy one of the number one ways average people moved up to being rich in the USA was all enabled purely by the copyright/patents laws protections you want to do away with.
I said nothing about patents (though I definitely have feelings about software patents you probably wouldn't like), I simply stated that it is farcical and dishonest to build an economic system predicated upon the restriction of first-amendment rights of consumers, such as the DMCA which prevents me from making copies of my files and sharing them with other people, which, outside of national security threats, is an ethically bankrupt proposition.
The reality is that these small people of which you speak are still beholden to a rent-seeking industry that exploits artists en masse. Most creatives historically got next to nothing. Most artists don't get anything close to rich off of their work.
Yet today, we are able to directly support creatives, and many creatives do quite well by managing the long tail and curating a small, but dedicated patronage. Many of these creatives make more money with such a model, while still allowing their work to be shared with proper attribution.
We have just been brainwashed by a century of corporate interests sticking their hand into every facet of the creative industry, convincing us that the systems they've built over decades are the only way for things to be, even if it infringes upon the rights of others.
Also, while I develop software as a trade, I have also been an artist my entire life, working across many mediums, and so my opinions about the intersection of creativity and technology are not just those of some "tech bro", and I don't think that kind of framing is productive or fair. Especially considering I grew up poor and homeless as a teenager, and have had to reckon with the economic prospects of which you speak much more closely than most.
Lyrics produced some of the first AI slop I noticed after ChatGPT was launched in late 2022, even if the large models hadn’t been trained on them specifically. Overnight there were a bunch of different advertising-laden sites that clearly scraped Genius or other lyric websites, and then had GPT generate commentaries on what the lyrics supposedly mean, so that these would get picked up by search engines.
The result was mostly comical, the commentaries for vacuous pop music all sounded more or less the same: “‘Shake Your Booty’ by KC and the Sunshine Band expresses the importance of letting one’s hair down and letting loose. The song communicates to listeners how liberating it is to gyrate one’s posterior and dance.” Definitely one of the first signs that this new tech was not going to be good for the web.
[dead]
[dead]
[flagged]
Please stop posting LLM-generated comments to HN.
No need to leave a comment in reply to such generated text. Just email the mods directly with a link to the username, they zap such accounts daily.
I feel compelled to support banning AI from infringing on art, even though most pop songs are terrible.
"pop" music had its own avalanche of slop long before the advent of AI. Soulless reproductions and remixes of once-popular songs are everywhere.
With AI slop showing up everywhere, there’s a real danger that folks will just no longer be motivated to produce real original content.
With all major models not basically trained on nearly all available data, beyond the financial AI bubble about to burst there’s also a big content bubble that’s about exhausted as folks are just pumping out slop vs producing original creative human output. That may be the ultimate long term tragedy of the present AI hype cycle. Expect “made by a human” to soon be a tag associated with premium brands and customer experiences.
> Expect “made by a human” to soon be a tag associated with premium brands and customer experiences.
I went to a grammar school and I write in mostly pretty high-quality sentences with a bit of British English colloquialism. I spell well, spend time thinking about what I am saying and try to speak clearly, etc.
I've always tried to be kind about people making errors but I am currently retraining my mind to see spelling mistakes and grammar errors as inherent authenticity. Because one thing ChatGPT and its ilk cannot do -- I guess architecturally —- is act convincingly like those who misspell, accidentally coin new eggcorns, accidentally use malapropisms, or use novel but terrible grammar.
And you're right: IMO the rage against the cultural damage AI will do is only just beginning, and I don't think people have clocked on to the fact that economic havoc is built-in, success or failure.
The web/AI/software-tech industry will be loathed even more than it is now (and this loathing is increasingly justified)
> one thing ChatGPT and its ilk cannot do -- I guess architecturally —- is act convincingly like those who misspell, accidentally coin new eggcorns, accidentally use malapropisms, or use novel but terrible grammar
Just wait a few more years until the majority of ChatGPT training data is filled with misspellings, accidental eggcorns, malapropisms and terrible grammar.
That, and AI slop itself.
It is of no cost to me when someone else writes a book, plays a song or draws a picture. It is also true that, basically whatever I ever do, someone else has done better. This does not stop me from doing those things because the value within them is in doing them.
We have cars, buses and planes, yet people do partake in pilgrimages. The process matters, even if only personally.
> folks will just no longer be motivated to produce real original content.
Honestly if your only motivation for creating art was “computers can’t do what I do” then… I don’t want to be too gatekeepy about it, but that doesn’t sound like you’re a ‘real’ artist to me. Real artists create art because they enjoy doing it, not because it’s the exclusive domain of humans.
You don’t need to be special, you don’t need to be the best, you don’t need to even be good or successful or recognized or appreciated (although of course all those things are nice) - you just have to be creating art.
AI slop is like 90’s websites and desktop publishing - there’s a novelty for AI-newbie-creators driving them to churn out lazy crap, while being oblivious to how it lands with strangers.
Tastes will mature, society will more vocally mock this crap, and we’ll stop seeing the sloppier stuff come out of reputable locations.
You assume that the public recognizes AI slop for what it is. Across platforms now, people are readily engaging with blatant AI text posts and generated images as if they are bona-fide. In fact, if you point out that the poster is a bot, you may well well get some flack from the community.
1 reply →
I will not stop writing music or drawing my furry bullshit, no matter the culture climate around me. Don't get your hopes up ;3
When you're the only one doing it, you'll have a large impact on model generation
We already have this in the physical world.
Plastic/synthetics are the slop of the physical world. They're a side product of extracting oil and gas so they're extremely cheap.
Yet if you look at synthetics by volume, probably 99% of them are used just because they're cheaper than the natural alternative. Yes, some have characteristics that are novel, but by and large everything we do with plastics is ultimately based on "they're cheaper".
Plastics, unfortunately, aren't going away.
> With AI slop showing up everywhere, there’s a real danger that folks will just no longer be motivated to produce real original content.
I think people would still produce original things as long they have the means for doing it. I guess we could say it is our nature. My fear is AI monopolizing the wealth that once would go to support people producing art.
This. I still produce original things and will continue to do so until I am incapable anymore. What's changed, though, is that I no longer put or discuss those things on the open internet because there's no realistic way to prevent it from getting used to train genAI models.
> With AI slop showing up everywhere, there’s a real danger that folks will just no longer be motivated to produce real original content.
BBC truly was ahead of times with their deletion of tv shows.
*edit. Will this actually change OpenAI's behaviour to any meaningful extent?
Other countries are currently going through the same. KODA is running a similar lawsuit on behalf of the Danish musicians, they can now point to Germany as an example, making it much easier for them to win.
Does what a US court rules really matter?
Probably not for something like this honestly. I feel like it would just keep getting appealed up. But what do I know? I'm not an attorney.
It does in Germany? And quite likely in the rest of the EU?
I guess. But I doubt openai will change its behaviour due to this.
2 replies →
There's a major risk to being the market leader in a new, controversial technology. Look what happened to Juul
Highly additive nicotine formulations targeted at teens is not exactly “new technology”.
These people would stream German schlager to every screen and speaker in Europe and charge for it 100 EUR monthly per breathing person, if they could. They are violent.
However, the lyrics are shown because the user requested them, shouldn't be the user be liable instead? The same way social networks are not liable for content uploaded by users? I think here there is a somewhat double standard.
Of course, maybe OpenAI et al should have get a license before training on the lyrics or to avoid training on copyrighted content. But the first would be expensive and the latter would require them to develop actual intelligence.
Why should the user be liable? They didn't reproduce the copyrighted work and the machine is totally capable of denying output (like it already does for other categories of material).
At the very least, the users being liable instead of OpenAI makes no sense. Like arresting only drug users and not dealers.
There are countries where drug consumption/posesion is penalized too. There is a similar example in other area: For instance, in Sweeden, Norway and Belize selling sex (aka prostitution) is legal, but buying it is not legal. So, your example actually exists in world legislation.
I'm just asking where are we going to put the line and why.
3 replies →
> However, the lyrics are shown because an action is the user so, shouldn't be the user be liable instead?
Same goes for websites where you can watch piracy streams. "The action is the user pressing play" sounds like it might win you an internet argument, but I'm 99% sure none of the courts will play those games, you as the operator who enabled whatever the user could do ends up liable.
I think that is completely different. Piracy websites do only one thing. Chatbots are different.
My concern is that where are we going to put the line: If I type a copyrighted song in Word is Microsoft liable? If I upload a lyric to ChatGPT and ask it to analyze or translate it, is it a copyright violation?
I totally understand your line of thinking. However, the one I'm suggesting could be applied as well and it has precedents in law (intellectual authors of crimes are punishable, not only the perpetrators).
3 replies →
This is such a bad take.
If that was case then Google wouldn't receive DMCA takedown of piracy links, instead offer up users searching for piracy content. Former is more prevalent than latter because one, it requires invasion of privacy - you have to serve up everyone's search results
two, it requires understanding of intent.
Same is the issue here. OpenAI then needs to share all chats for courts to shift through and second, how to judge intent. If someone asks for a German pop song and OpenAI decides to output Bochum - whose fault is that?
Member when music sites were suing YouTube for music videos, and now they are begging people to watch them there and YT view counts are a bragging topic?
Soon music industry will be begging OpenAI for exposure of their content, just like the media industry is begging Google for scraping.
That's exactly the difference between using with or without license.
Youtube pays the music owner. OpenAI can never pay as even with stealing content they still manage to loose 5 dollars for every dollar they make.