Comment by xmodem
2 years ago
I've written a podcast ad blocker. I'm not based in the US but listen to a few podcasts from large US-based networks. They don't have any ad inventory to sell so instead they fill the slots with ads for other shows on their network and I end up hearing 7 copies of the same ad for the same true crime sludge in each podcast episode.
It works by fetching the XML feed, downloading the file and applying an open-source audio fingerprinting library to identify timestamps of the same segment repeated multiple times, using ffmpeg to delete all instances of repeated segments, and republishing a new XML feed that I then consume with a standard podcast app. It works surprisingly well.
Do you mind sharing the code for this, or point me to the audio fingerprinting solution you use? I'd love to replicate this.
https://github.com/AddictedCS/soundfingerprinting Is the library. It was the first thing I found and it seemed to do the job. I’m not a .NET guy so it’s implemented as a CLI tool that takes a wave file as input and spits out a list of timestamps.
Sharing the code would mean getting into the business of publishing an ad blocker which is not something I personally have the bandwidth for. It’s also far from my finest work, being something that I banged out in a weekend. Happy to answer other questions about it, though!
Fair point on the publishing! Thanks for sharing the library.
Is the "same-segment-repeat" check across multiple episodes? Or just a single episode has the same segment repeated a few twice, so it works for this specific network?
The current implementation searches only within a single episode. Searching across multiple episodes could be done but would introduce false positives if a show re-uses intro music or other assets, and personally I’d prefer to keep that stuff, so I’d need to figure out heuristics to differentiate it from ads.
Some interview podcasts start with a 3-4 second clip from the interview. Does this cause false positives for you?
2 replies →