Comment by mapontosevenths
2 days ago
I can think of half a dozen ways to detect AI music in it's current form, btu I'm not sure anyone has actually bothered implementing such a system.
Is anyone here aware of one? I might give it a go if not.
2 days ago
I can think of half a dozen ways to detect AI music in it's current form, btu I'm not sure anyone has actually bothered implementing such a system.
Is anyone here aware of one? I might give it a go if not.
This may be of interest to you. Not exactly what you’re asking, but it may give you clues as to people and research to check out.
https://www.youtube.com/watch?v=xMYm2d9bmEA
I also remembered a friend telling me about someone talking about a method to detect AI music. I forget the specifics (I haven’t watched the video, was only told about it) but I remember the channel.
https://www.youtube.com/@RickBeato
Pretty sure Deezer did a in depth article about how they detect and remove AI music. But it seems more likely they are detecting artifacts of the current tools, not something that would be impossible to bypass eventually.
IIRC it's not just actual artifacts, but also statistical measures, i.e. songs that are "inhumanly average".
If they're not doing it already, I think some metadata analysis, going by things like upload patterns, would also work well.
> I can think of half a dozen ways to detect AI music in it's current form
Can you give a few examples?
For example, how to detect that the song I linked is AI compared to say, anything Taylor Swift produces, or to any overly produced pop song or an electronic beat.
My first instincts offhand were:
* N-gram analysis of lyrics. Even good LLM's still exhibit some weird pattern's when analyzed at the n-gram level.
* Entropy - Something like KL divergence maybe? There are a lot of ways to calculate entropy that can be informative. I would expect human music to display higher entropy.
* Plain old FFT. I suspect you'll find weird statistical anomalies.
* Fancy waveform analysis tricks. AI's tend to do it in "chunks" I would expect the waveforms to have steeper/higher impulses and strange gaps. This probably explains why they still sound "off" to hifi fans.
* SNR analysis - Maybe a repeat of one of the above, but worth expanding on. The actual information density of the channel will be different because diffusion is basically compression.
* Subsampling and comparing to a known library. It's likely that you can identify substantial chunks that are sampled from other sources without modification - Harder because you need a library. Basically just Shazam.
* Consistency checks. Are all of the same note/instrument pairs actually generated by the same instrument throughout, or subtly different. Most humans won't notice, but it's probably easy to detect that it drifts (if it does).
That's just offhand though. I would need to experiment to see which if any actually work. I'm sure there are lots more ways.
Thank you!
This will likely have a lot of false positives on a lot of genres. E.g. I suspect genres like synthpop and trance (and a lot of other electronic music) will likely hit a lot of those points with regards to music and sampling.
Lyrics are also not a given (when they are likely curated by humans). E.g. compare the song I referenced (https://dumpstergrooves.bandcamp.com/track/he-talked-a-big-g...) to, say, Taylor Swift's current most listened to song: https://genius.com/Taylor-swift-the-fate-of-ophelia-lyrics I'd chose the AI one in a heart beat :)
I wonder if a combination of all of those may work for a subset of songs, but I don't think you can do it with any confidence :(
1 reply →
For me it's always the voice. It sounds slightly rough and digital/robotic, not smooth and natural.
If it's instrumental only, especially electronic music then I don't think I could tell.
That doesn't explain how you would disambiguate this at scale. And there are likely a dozen legit genres which use voices like this :)
1 reply →