Comment by xmodem

2 years ago

I've written a podcast ad blocker. I'm not based in the US but listen to a few podcasts from large US-based networks. They don't have any ad inventory to sell so instead they fill the slots with ads for other shows on their network and I end up hearing 7 copies of the same ad for the same true crime sludge in each podcast episode.

It works by fetching the XML feed, downloading the file and applying an open-source audio fingerprinting library to identify timestamps of the same segment repeated multiple times, using ffmpeg to delete all instances of repeated segments, and republishing a new XML feed that I then consume with a standard podcast app. It works surprisingly well.

Do you mind sharing the code for this, or point me to the audio fingerprinting solution you use? I'd love to replicate this.

  • https://github.com/AddictedCS/soundfingerprinting Is the library. It was the first thing I found and it seemed to do the job. I’m not a .NET guy so it’s implemented as a CLI tool that takes a wave file as input and spits out a list of timestamps.

    Sharing the code would mean getting into the business of publishing an ad blocker which is not something I personally have the bandwidth for. It’s also far from my finest work, being something that I banged out in a weekend. Happy to answer other questions about it, though!

Is the "same-segment-repeat" check across multiple episodes? Or just a single episode has the same segment repeated a few twice, so it works for this specific network?

  • The current implementation searches only within a single episode. Searching across multiple episodes could be done but would introduce false positives if a show re-uses intro music or other assets, and personally I’d prefer to keep that stuff, so I’d need to figure out heuristics to differentiate it from ads.