Comment by daemonologist

11 hours ago

It's interesting to me that all AI music sounds slightly sibilant - like someone taped a sheet of paper to the speaker or covered my head in dry leaves. I know no model is perfect but I'd have thought they'd have ironed out this problem by now, given how pervasive it is and how significantly it degrades the end product.

4 comments

daemonologist

recursive 5 hours ago

I've noticed this too. I have a few theories about this. Disclosure: I know a little about audio, and very little about audio generative AI.

First, perhaps the models are trained on relatively low-bitrate encodings. Just like image generations sometimes generate JPG artifacts, we could be hearing the known high-frequency loss of low data rate encodings. Another idea is that 'S' and 'T' sounds and similar are relatively broad-spectrum sounds. Not unlike white noise. That kind of sound is known to be difficult to encode for lossy frequency-domain encoding schemes. Perhaps these models work in a similar domain and are subject to similar constraints. Perhaps there's a balance here of low-pass filter vs. "warbly" sounds, and we're hearing a middle ground compromise.

I don't know how it happens, but when I hear the "AI" sound in music, this is usually one of the first tells.

userbinator 2 hours ago

Perhaps this is what the human is for - to apply an EQ curve.

AlphaAndOmega0 11 hours ago

Agreed. I find that particularly annoying, and I also seem to find that the spatial arrangement or stereo effect is muted for most instruments (or the model simply doesn't use that feature as well as a good human musician).

gowld 3 hours ago

I suspect it's because AI generates music as a waveform incrementally not globally so it favors smoothly varying sounds, not sharp contrast. If it generated MIDI data and then used a MIDI synth to create the audio, you wouldn't get that.