Comment by bjourne
3 months ago
A spectrogram is lossy and not a one-to-one mapping of the waveform. Riffusion is, afaik, limited to five-second-clips. For these, structure and coherence over time isn't important and the data is strongly spatially correlated. E.g., adjacent to a blue pixel is another blue pixel. To the best of my knowledge no models synthesize whole songs from spectrograms.
No comments yet
Contribute on Hacker News ↗