Comment by adzm

4 months ago

I am talking about using spectrograms (Fourier transform into frequency domain then plotted over time) that results in a 2d image of the song, which is then used to train something like stable diffusion (and actually using stable diffusion by some) to be able to generate these, which is then converted back into audio. Riffusion used this approach.

9 comments

adzm

mrheosuper 4 months ago

IF you think about it, a music sheet is just a graph of Fourier transform. It shows at any points of time, what frequency is present (the pitch of note), and for how long (duration of note),

112233 4 months ago

it is no such thing. nobody maps overtones on sheet, durations are toast, you need to macroexpand all flat/sharps, volume is passed by vibe-words, it has 500+ of historical compost and so on. sheet music to fft is like wine tasting to a healthy meal

bjourne 4 months ago

A spectrogram is lossy and not a one-to-one mapping of the waveform. Riffusion is, afaik, limited to five-second-clips. For these, structure and coherence over time isn't important and the data is strongly spatially correlated. E.g., adjacent to a blue pixel is another blue pixel. To the best of my knowledge no models synthesize whole songs from spectrograms.

Mistletoe 4 months ago

How does Spotify “think” about songs when it is using its algos to find stuff I like?

lbourdages 4 months ago
Does it really need to think about the song contents? It can just cluster you with other people that listen to similar music and then propose music they listen to that you haven't heard.
- vanderZwan 4 months ago
  
  That's one method they use, but "just cluster" is doing a lot of heavy lifting in that sentence. It's why Erik Bernhardsson came up with the Approximate Nearest Neighbors Oh Yeah algorithm (or ANNOY for short)
  > We use it at Spotify for music recommendations. After running matrix factorization algorithms, every user/item can be represented as a vector in f-dimensional space. This library helps us search for similar users/items. We have many millions of tracks in a high-dimensional space, so memory usage is a prime concern.
  [0] https://erikbern.com/2013/04/12/annoy.html
  [1] https://github.com/spotify/annoy?tab=readme-ov-file
  
  1 reply →
efskap 4 months ago

This article [0] investigates some of the feature extraction they do, so it's not just collaborative filtration.
[0]: https://www.music-tomorrow.com/blog/how-spotify-recommendati...