Comment by crazygringo

4 days ago

This is insane.

I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?

> The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand. They’re so common that I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.

> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.

The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

  • > The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

    They have a page directly addressed to AI companies, offering them "enterprise-level" access to their complete archives in exchange for tens of thousands of dollars. AI may not be their original/primary motivation but they are evidently on board with facilitating AI labs piracy-maxxing.

    • You go where the money is. Infra isn’t free. Churches pass the plate every Sunday. Perhaps one day we’ll exist in a more optimal socioeconomic system; until then, you do what you have to do to accomplish your goals (in this context, archivists and digital preservation).

      70 replies →

    • Or they know that those parties are going to hammer their servers no matter what so they will at least try and get some money out of it.

    • That made me chuckle, Enterprise Level Access. I mean as ai company, that’s incredibly cheap and instead of torrenting something, why get it. That price is just a fraction of a engineers salary.

      6 replies →

    • I think there is a big legal difference between helping preserve books and papers with little regard for copyrights, to then turn around and selling access to large companies.

    • So either these folks, who are admittedly living targets of all the world's copyright lawyers, have means to receive tens of thousands of USD anonymously and stealthily,

      or they are totally immune to deanon / getting tracked down,

      or they are stupid enough to allow their greed to become their downfall,

      or this legend about underground warriors of light fighting against evil copyrighters is utter bullshit.

  • > I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.

    Sounds like one of these: https://krebsonsecurity.com/2025/11/is-your-android-tv-strea...

    Probably not your problem to play tech support for these people and explain why being part of a botnet is bad, but mildly concerning nonetheless!

  • > The Anna’s archive group is ideologically motivated.

    Very interesting, thank you. So using this for AI will just be a side effect.

    And good point -- yup, can now definitely imagine apps building an interface to search and download. I guess I just wonder how seeding and bandwidth would work for the long tail of tracks rarely accessed, if people are only ever downloading tiny chunks.

    • I think the people seeding these are also ideologs and so would be interested in also supporting the obscure stuff, maybe more than the popular. There is no way any casual listeners would go to the quite substantial trouble of using these archives.

      Anyone who wants to listen to unlimited free music from a vast catalog with a nice interface can use YouTube/Google Music. If they don't like the ads they can get an ad blocker. Downloading to your own machine works well too.

  • Spotify is $12/month at most to get unlimited ad-free access to virtually all music.

    To get access to "all" TV content legally would be hundreds of dollars a month. And for many movies you must buy/rent each individually. And legal TV and movies are much more encumbered by DRM and lock in, limiting the way you can view them. (like many streaming apps removing AirPlay support, or limiting you to 720p in some browsers)

    I think Spotify wins over pirating because of its relatively low cost and convenience. Pirating TV/Movies have increased as the cost to access them has.

    • It's not even close to virtually all music. 256M songs doesn't come even close.

      It's virtually all popular music recently published commercially in the world.

      It's missing large portions of bootlegs, old music, foreign music, radio shows, mixtapes and live streaming music to list a few prominent categories from music in my private archive of cultural works. Those categories, btw, are well represented by torrents on tracker sites.

    • > Spotify is $12/month at most to get unlimited ad-free access to virtually all music.

      Until they decide to silence the artist you want to listen to because emperor god trump decides to unperson them.

      Putting what music you listen to in the hands of a US corporation is such a dangerously stupid idea that it is amazing to me that there are people here who are OK with it.

      >I think Spotify wins over pirating because of its relatively low cost and convenience

      Spotify isn't "convenient" if you want to control and understand the media and software in your life. https://www.defectivebydesign.org/spotify

      2 replies →

    • Barely all. I have so many songs in my playlist that has randomly become unavailable. It's quite frustrating to be honest.

    • It's absolutely not all, I'm an extremely casual listener, not 'into' music or anything, and I have plenty in a playlist that have disappeared (mostly I don't even know what they are, it's just greyed out with no information) for whatever reason. And that's just the stuff that was there at some point that I liked.

      One of them has come back recently. It's still listed as by the wrong artist (same name, but dead, vs. the active artist who actually performed it) but I'm not reporting it again because I suspect I may have made it disappear for a couple of years in doing so before.

      It's kind of crap and disorganised after anything more than barely glancing at it really, must be infuriating for (or just not used by) people who actually are into it.

    • Spotify used to be good, but have enshittified their UI past the point of usability for me. It really wants to play me tracks that are profitable for Spotify, not tracks I want to hear.

      What you say is still true of the Amazon and Apple offerings, though. Haven't tried Youtube Music, so can't comment on that.

      2 replies →

  • > There are already tools to automatically locate and stream pirated TV and movie

    Before we had spotify we had grooveshark. Streaming pirated content came first, and everything old is new again.

  • They’re doing it for everyone, so, yes, they are doing it for AI companies.

  • > They’re definitely not doing this for AI companies.

    So it's just yet another instance of enormous luck / annuit coeptis for the wealthy and powerful, then.

    Such lucky bastards. Whatever happens, does so to their benefit, and all inconvenient questions about the nature of their luck automatically recede into the conspiracy theory domain.

    And let's not forget that Anna's Archive is also the host to the world's largest pirate library of books and articles.

  • > The Anna’s archive group is ideologically motivated.

    Anna’s archive business is stealing copyrighted content and selling access to it. It's not ideologically motivated.

    What ideology is about pirating books and music where most of the people producing this stuff cannot afford to do it full-time? It's not like pirating movies, software and large videogame studios, which is still piracy, but they also make big money and they don't act all the time in the interests of the users.

    Writers and musicians are mostly broken. If we sum the rising cost of living, AI generated content and piracy, there's almost no reward left for their work. Anna’s archive is contributing to the art and culture decadence. They sell you premium bandwidth for downloading and training your AIs on copyrighted content, so soon we can all generate more and more slop.

    • > Anna’s archive business is stealing copyrighted content and selling access to it.

      There is not enough profit in that compared to the risk. They're also not exactly aggressive about it (there are groups which host mirrors who charge far more/finance it in the usual criminal way of getting people to install malware).

      To me, there's a "motivation gap" between what they get out of this and the effort it takes, so there's some kind of "ideology". Whether it's 100% what they say it is, is another question.

    • Writers and musicians are mostly broken. If we sum the rising cost of living, AI generated content and piracy, there's almost no reward left for their work.

      For authors (books) ~70% of all the book sales go to the publisher, not the author (trad pub): https://reedsy.com/blog/how-much-do-authors-make/

      For musicians: depending on how big a name you are and which publisher you chose, the publishers compensation ranges from 15% (small name/indy) to 60% (big name/Universal, Sony) https://www.careersinmusic.com/music-publishing/

      This is an industry with profit maximising as its goal like every other industry. If artists are broke, first take a look at the publishers.

    • Agreed. I see far too many people rationalizing piracy as a principled thing to do. Instead of finding ways to improve the market such that the control of content isn't siloed in monopolistic corporations, many celebrate Annas Archive which is itself a more or less monopolistic profit-interested entity. The major difference being that we don't have to pay directly. The cost continues to fall on the writers and artists and the industry suffers.

      14 replies →

  • > I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand.

    It may be relevant for those people, but I lost all interest in current TV or streaming stuff. I just watch youtube regularly. What's on is on; what is not on is not really important to me. My biggest problem is lack of time anyway, so I try to reduce the time investment if possible, which is one huge reason why I have zero subscriptions. I just could not keep up with them.

Flippant response: If it's ok for Meta for commercial use, why not for researchers for legitimate research work?

More serious response: research is explicitly included in fair use protections in US copyright law. News organizations regularly use leaked / stolen copyrighted material in investigative journalism.

  • Because the laws are there to protect people with money from people who don't have money.

The metadata is probably more useful than the music files themselves arguably

  • Self-supplied metadata in music catalogs is notoriously shit. The degree to which most rights owners don't give a damn is telling.

    Spotify's own metadata is not particularly sophisticated. "Valence", "Energy", "Danceability", etc. You can see from a mile away that these are assigned names to PCA axes which actually correspond pretty poorly to musical concepts, because whatever they analyzed isn't nicely linearly separable.

  • Especially since they scraped Spotify's popularity rating as well

    • I can't think of many situations where that would be particularly valuable, considering it favours recent plays and the cutoff date is already almost half a year old.

      2 replies →

A little off topic, but I remain naively hopeful that the horror you describe will keep Spotify from going down the same road Netflix did once content owners decided to get into the streaming business themselves, so that streaming a movie today requires you to "change the channel" to whichever service offers that movie.

Can you imagine your favorite playlist needing to swap among 10 apps, each requiring a $10/month subscription?

> The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

Are you aware Annas Archive already solved the exact same problem with books?

> this doesn't even seem particularly useful for average consumers/listeners

I can imagine this making it wayyy easier to build something like Lidarr but for individual tracks instead of albums.

I dunno if they publish like a 10 TB torrent of the most popular music I can see people making their own music services. A 10 TB hard disk is easily affordable, and that's about 3 million songs which is way more than anyone could listen to in a lifetime, even if you reduce that by 100x to account for taste.

It's probably going to make the AI music generation problem worse anyway...

  • I would expect more data to make ai music generation better

    • The problem isn't the generation, it's the taste of the generators.

      An earnest young lady with a guitar can already sing a light jazz version of 'Highway to Hell' or whatever. Just go to your local cafe to hear it. The objective quality is terrific.

      In the past, this wouldn't have been made because the end result is subjectively banal. But now people with no taste can churn it out by the thousands of hours for free.

>The thing is, this doesn't even seem particularly useful for average consumer

it's an archive to defend against Spotify going away. Remember when Netflix had everything, and then that eroded and now you can only rely on stuff that Netflix produced itself?

the average consumer will flock when Spotify ultimately enshitifies

  • Netflix didn't lose content by choice. Actual right holders decided to pull their content and create rival services.

    Has nothing to do with perceived enshittification by Netflix (even though they have enshittification too).

    Spotify is under the same threat: they have no content that they own. Everything is licensed.

    • Spotify is banking on AI music which is enough to tell you everything you need to know about the company, their C-suite and their opinion on music.

      5 replies →

    • But, Netflix did lose their content by choice! Way back in the 00s, you could pay Netflix something like $5 a month, and they would mail you physical DVDs of almost any movies you could ever want to watch. In fact, my recollection is that the physical library was generally much more extensive than the streaming library, at least through the early ‘10s.

      Sure, they had the rug yanked out from under them with digital streaming, but they very deliberately put themselves into that position when they pivoted to streaming in the first place.

      1 reply →

>> But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.

  • They are too big to fail but they aren’t too big to have to pay out a huge settlement. Facebook annual revenue is about it twice that of the entire global recording industry. The strategy these companies took was probably correct but that calculation included the high risk of ultimately having to pay out down the line. Don’t mistake their current resistance to paying for an internal belief they never will have to.

    • > They are too big to fail but they aren’t too big to have to pay out a huge settlement. Facebook [...]

      I think it's pretty clear from history that they are too big to have to pay out a huge settlement.

      First, they never had to. There was never a "huge" settlement, nothing that actually did hurt.

      Second, the US don't do any kind of antitrust, and if a government outside the US tries to fine a US TooBigTech, the US will bully that government (or group of governments) until they give up.

      2 replies →

Just cite facebook getting busted training its AI on torrents proven to contain unlicensed material lol

This, indeed, has mostly implications for ML, training, etc. As otherwise the whole catalog is available to partners, but costs a lot. So Anna did indeed liberate the content, but I'm definitely not switching off my Spotify subscription, even though, in my personal taste, neither quality, nor UI does match Apple Music. It is still useful to have s.o. serve the content for you.

DRM aside, Spotify clearly should have logic that throttles your account based on requests (only so many minutes in a day..), making it entirely impractical to download the entirety of it unless you have millions of accounts.

  • >unless you have millions of accounts.

    Challenge accepted…

    This is probably how they did it, over time, was use a few thousand accounts and queued up all the things, and download everything over the course of a year.

    • Notably 160kbit is the free-tier bitrate, so they presumably used unpaid accounts.

>The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Curious why not? Assuming you only used the metadata. I think they would be considered raw facts and not copyrightable.

Just like with anything digital you (and Spotify) are fully at the mercy of the rights holders. When (not if) they pull their stuff, or replace their stuff, or change their stuff, you can never get the original back unless you preserve it.

Largest example: a lot of Russian music is not available on Spotify because of the Russia-Ukrane war, and Spotify pulling out of Russia. So they don't have the licneses to a lot of stuff because that belongs to companies operating within Russia.

>I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

What's stopping someone from sticking a microphone next to their speaker?

Slow, but effective.

  • > Slow, but effective.

    I wouldn't call this very effective. It would take an impractically long amount of time to capture a meaningful fraction of the collection and quality would suffer greatly.

  • Even if you plug the audio output into the input you would still be taking a quality loss by passing the audio through a DAC and then an ADC. Maybe if the quality of your hardware is good enough it wouldn't matter, but then you would be limited to only ripping 24 hours of audio per day...

    • You don't have to pass it through a DAC. There's no equivalent of HDCP for protecting digital audio end to end. Crudely, you could capture S/PDIF but really, skip that and just output to a virtual audio device for recording. No DAC in the path either way.

      But yes, it is inconvenient and slow.

    • They recently started offering lossless, could you get down to the equivalent of 320kbps?

      I grew up on sites like Suprnova, and quickly found I could not discern the difference between 320 mp3s and lossless.

      Even now, I only seem to notice if I use a very high end pair of headphones, and mostly with electronic music that has a lot of soft parts with sounds that are in the low or high end of the spectrum.

> I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

Do they have DRM at all? Youtube and Pandora don't.

  • Spotify has DRM, and you can find open-source reimplementations of it on github.

    Their native clients use a weak hand-rolled DRM scheme (which is where the ogg vorbis files come from), whereas the web player uses Widevine with AAC.

  • Yes they do use DRM. I know they are using Widevine on the web player, but possibly other ones too (never looked very far). Not sure for the app, it might be that it is using OGG streams with a custom DRM (which is probably the one some existing downloaders actually (ab)use).

This leak will also be really useful to bad actors who will resell the music from this list without paying royalties to the artists.

  • Which is how Spotify started... And is still carrying on. So nothing has changed.

    • Spotify pays 70% of revenue to rights holders.

      Why don't you ask them where the money inteded for artists is going? You know? The small insignificant companies of Sony, Warner Music, EMI that own the vast majority of music and own all the contracts?

      6 replies →

  • I just started DJing and something I quickly noticed is how garbage Spotify's music sounds compared to FLACs I've purchased. The max bitrate is very low.

    • tidal is a thing and can be scraped the same way. I wonder how big that collection would be as it can go from 50mb to 300mb for 3min

  • Spotify fucks over most artists anyway, so who cares?

    • yeah it's wild to me how folks will defend the current status quo when it's clearly broken.

      people defend convenience way too much. spotify isn't good for us and spotify-like-streaming is destroying the music industry.

  • this argument is so tired.

    most artists dont really care about streaming or selling their music. most of their real money comes from touring, merch, and people somehow interacting with them.

    most musicians just want to make music, express themselves, and connect with folks who enjoy their stuff or want to make music with em.

    Even some of the largest artists in the world only receive a few grand a year from streaming. Only the top 1% or so of artists get enough streams to even come close to living off it. It isn't that big of a deal. Music piracy isn't the theft people think it is, lars.

    youtube is kind of the same way. the real money comes from sponsorships which come from engagement. nobody on youtube is upset that their video got stolen because that mentality was never sold to us to justify screwing us over. musicians, however, were used as pawns so music labels could get more money.

    now folks will say stuff like "this is theft" which is just a roundabout way of supporting labels who steal from the artists. so, it's just a weird gaslighting. there's a reason folks turned on metallica over the napster stuff. metallica were being used to further the interests of labels over the interests of fans. and now you're doing the same thing :) It's a script we hear over and over again yet people keep falling for it.

    • > most artists dont really care about streaming or selling their music. most of their real money comes from touring, merch, and people somehow interacting with them.

      I think you have it the wrong way round. I'm sure that musicians would love to make money from album / song sales. It's just that between piracy and companies like Spotify, artists make pennies on these activities, so their only choice is to make money on more labor-intensive stuff where they retain more control.

      Note that Spotify, somehow, finds it profitable to be in the streaming business.

      4 replies →

    • Spotify paid out ten billion dollars to artists in 2024. This is not small potatoes - total 2024 music industry merchandise sales was around $14b.

      Youtube also paid out literally 50x more to creators in 2024 than Patreon had total subscriptions on the platform.

      These big platform payouts matter a lot.

      23 replies →

    • Weird Al pointed out in 2023 that his 80 million Spotify views that year netted him $12 - enough for a nice sandwich.

> Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

Download the lot to a big Nas and get Claude to write a little fronted with song search and auto playlist recommendations?

The first users of this dataset will be Big Tech corps. Meta, Alphabet, OpenAI, Microsoft, Apple will all be happy to use this dataset for training their LLMs.

For them, 300TB is just cheap

  • They already have this data. See jukebox from OpenAI, released before chatgpt.

Thank god we are taking care of the “researchers working on things like music classification and generation” ! As long as we can convince ourselves we have a sound analysis of it, no need to support and defend people making actual art right. So much already made, who needs more?

This is not to defend Spotify (death to it), but to state that opening all of this data for even MORE garbage generation is a step in the wrong direction. The right direction would be to heavily legislate around / regulate companies like Spotify to more fairly compensate the musicians who create the works they train their slop generators with.

  • What, precisely, is the point you’re trying to make here?

    • Expressing frustration at the pervasive tendency of technologists to look at everything, including art which is a reflection of peoples' subjective realities, with an "at-scale" lens, e.g., "let's collect ALL of it, and categorize it, and develop technologies to mash it all together and vomit out derivative averages with no compelling humanist point of view"

      I hope readers will feel our frustration.

      2 replies →

  • How does Spotify defend people who actually make art? There's virtually no difference between pirating and steaming through Spotify for the vast majority of artists.

    • Personally as an artist I'd rather give it to people directly for free but I'll meet the audience where they are. The "compensation" does not factor into it at all.

      Interestingly, I'm seeing more and more small bands stepping off of Spotify, mainly because of AI clones and botted stream scams. Apparently they've decided losing that reach is acceptable. (anecdotal ofc. but even on local scale it's an interesting choice)

  • updated - thank you commenters for making it clear that my sentiment was not clear

  • Spotify doesn't take care of artists, if you knew any artists you'd understand that Spotify is atrocious for people who make music.

I believe that we need to distinguish between convenience and preservation here. It is indeed convenient for consumers to use Spotify now whilst it exists and operates the way it does. They could go under, they could change their business model, they could decide to purge everything that is not easily justifiable commercially.

As a society, we should do our best to preserve this trove.

Id be stunned if we didn't find out Anna's Archive is a front for a handful of shadier VCs who are into AI. Even if AA themselves don't know it and just take the cash.

> The thing is, this doesn't even seem particularly useful for average consumers/listeners

Yeah. To me it is not really relevant. I actually was not using spotify and if I need to have songs I use ytldp for youtube but even that is becoming increasingly rare. Today's music just doesn't interest me as much and I have the songs I listen to regularly. I do, however had, also listen to music on youtube in the background; in fact, that is now my primary use case for youtube, even surpassing watching movies or anything else. (I do use youtube for getting some news too though; it is so sad that Google controls this.)