Comment by WD-42
4 days ago
Incredible.
> A while ago, we discovered a way to scrape Spotify at scale.
They wont and shouldn’t divulge the details, but I imagine that would be a fun read!
4 days ago
Incredible.
> A while ago, we discovered a way to scrape Spotify at scale.
They wont and shouldn’t divulge the details, but I imagine that would be a fun read!
How they manage to transfer 300TB of data while remaining anonymous is also astonishing.
I would guess this can be hidden under normal music streaming activity? But one would need lots of proxies!
Rent a dedicated server, setup mullvad wireguard on it or whatever. Download stuff to said server using wireguard.
Sure, you can also use Tor. The people engaged in copyright-related illegality generally don't.
But then you need to rent a server without leaving any hint on your real identity. Which means going to some dodgy corners of the internet.
I certainly wouldn't attempt
2 replies →
It's hard to imagine anything but physical egress for that kind of volume.
50 free accounts continually streaming music rack up 20 TB in a month. So that would take about 1.5 years. Our you use 750 accounts and do it in a month.
I would say it's weird they don't rate limit accounts but probably having a device play music pretty much all the time isn't even that rare of a use case.
4 replies →
Perhaps they leased a botnet. https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-fro...
I mean 300TB is nothing for a streaming service, like it woudn't even show on a dashboard. They probably did that over weeks which is invisible.
It is not hard. But please don't misuse it and ruin the fun for everyone. It is nice to be able to use the music relatively easily for hobby projects. My music server has functionality to play tracks from Spotify this way:
https://codeberg.org/raphson/music-server/src/branch/main/sp...
Where the magic actually happens: https://github.com/librespot-org/librespot
I wonder how many premium accounts Anna’s Archive had to use to scrape the whole thing. Surely Spotify has scrape protection and wouldn’t allow a single account to stream (download) millions of separate tracks.
10 replies →
Seems like librespot is not directly suppporting the fetching of audio to files, and intentionally so, in order to not get targeted by Spotify. Obviously you can dump the audio to file as it "plays", but that would be be very slow.
So I suppose if one wanted to use librespot for archiving, one would have to modify it to support this use case.
"at scale" could mean they had direct access to a server or to storage, maybe because they had an insider giving them access, or they found secrets that had leaked somewhere?
they're probably just using something like https://github.com/nor-dee/spotizerr-spotify
No way, that would take far too long.
Probably not, those tools don't actually download Spotify tracks at source quality.
There are tools that actually download directly from Spotify (needs premium then) but yeah most of them just use the search and download from other sources like YouTube without mentioning it. I won't say which tools download directly out of fear that they get killed but they exist.
2 replies →