Comment by troupo
2 days ago
No one in these threads ever discusses how you would identify and remove AI-generated music.
E.g. how is this worse and needs to be removed: https://youtu.be/L3Uyfnp-jag?si=SL4Jc4qeEXVgUpeC but crap that top pop artists vomit out into the world doesn't
I can think of half a dozen ways to detect AI music in it's current form, btu I'm not sure anyone has actually bothered implementing such a system.
Is anyone here aware of one? I might give it a go if not.
This may be of interest to you. Not exactly what you’re asking, but it may give you clues as to people and research to check out.
https://www.youtube.com/watch?v=xMYm2d9bmEA
I also remembered a friend telling me about someone talking about a method to detect AI music. I forget the specifics (I haven’t watched the video, was only told about it) but I remember the channel.
https://www.youtube.com/@RickBeato
Pretty sure Deezer did a in depth article about how they detect and remove AI music. But it seems more likely they are detecting artifacts of the current tools, not something that would be impossible to bypass eventually.
IIRC it's not just actual artifacts, but also statistical measures, i.e. songs that are "inhumanly average".
If they're not doing it already, I think some metadata analysis, going by things like upload patterns, would also work well.
> I can think of half a dozen ways to detect AI music in it's current form
Can you give a few examples?
For example, how to detect that the song I linked is AI compared to say, anything Taylor Swift produces, or to any overly produced pop song or an electronic beat.
My first instincts offhand were:
* N-gram analysis of lyrics. Even good LLM's still exhibit some weird pattern's when analyzed at the n-gram level.
* Entropy - Something like KL divergence maybe? There are a lot of ways to calculate entropy that can be informative. I would expect human music to display higher entropy.
* Plain old FFT. I suspect you'll find weird statistical anomalies.
* Fancy waveform analysis tricks. AI's tend to do it in "chunks" I would expect the waveforms to have steeper/higher impulses and strange gaps. This probably explains why they still sound "off" to hifi fans.
* SNR analysis - Maybe a repeat of one of the above, but worth expanding on. The actual information density of the channel will be different because diffusion is basically compression.
* Subsampling and comparing to a known library. It's likely that you can identify substantial chunks that are sampled from other sources without modification - Harder because you need a library. Basically just Shazam.
* Consistency checks. Are all of the same note/instrument pairs actually generated by the same instrument throughout, or subtly different. Most humans won't notice, but it's probably easy to detect that it drifts (if it does).
That's just offhand though. I would need to experiment to see which if any actually work. I'm sure there are lots more ways.
2 replies →
For me it's always the voice. It sounds slightly rough and digital/robotic, not smooth and natural.
If it's instrumental only, especially electronic music then I don't think I could tell.
2 replies →
> but crap that top pop artists vomit out into the world doesn't
It should!