Comment by sciencesama

3 days ago

Need to transcribe and skip ads in the podcast ! That would be a killer feature !

I actually have a WIP cross platform app that does exactly this. It is more generic around processing any audio/video with whisper and integrating with openai or local llm's for summarization and other things but I also added a podcast specific ad skipping feature (it's not as perfect as something manual like sponsorblock for youtube yet but i'd say it's about 85% accuracy at the moment dependent on the models used)

  • Not to hijack's OP great work, but when you say 85% you mean true positives? How about the false positives?

    • My prompting is conservative to err on the side of playing an ad if there is a chance it might be part of the actual content, not really getting false positives at all yet. That being said while still in development I haven't reached the stage of running on a huge collection of podcasts to get more representative statistics.

      1 reply →

Hi, I have a task to check SponsorBlock API to see if this can work for podcasts. In the other hand I doubt a bit because I don't know if that could harm podcasters in any way. But I think something like SponsorBlock eventually will be a feature. PD, awesome idea other people are doing using Whisper.

  • One of the challenges that are always going to be present when trying to skip podcast ads is that the vast majority (especially on larger shows) of ads are inserted dynamically even those that are read by the host. That means that not everybody gets them (geo targeted) and they may not be served to every listener at all times. Makes it a more challenging problem than the baked in sections of YT vids that Sponsorblock targets.

    • Exactly. The SponsorBlock system would need to be updated to support marking ad block stop and start points by audio signature (much more complicated than simple timestamps) or transcription (compute intensive).

    • Uh, that could be a problem, I was thinking on sponsors made by the podcaster itself, but some plattforms do insert extra adds (I saw them at the beggining or end) and makes sense that they are country-specific and not the same every time.

      2 replies →

I wrote something that does this (in R of all things)! I still need to polish it up a little in share it though. It goes: podcast mp3 -> whisper transcription -> Gemini Flash to identify ad blocks -> FFMPEG to replace ad blocks with skip sound.

  • That sounds really cool!

    This is probably paranoid, but IIWM I'd be a little hesitant to use Gemini since this is a slightly gray area. If Google decides you're violating any ToS they might nuke your entire Google account and any "associated" accounts.

    Depending on the podcast I would also worry that the podcast content might violate some safety guidelines they have that might also get you in hot water. For example, some of the podcasts I listened to in 2020/2021 timeframe had discussion about the Lab Leak theory of Covid back when that was a capital offense on Youtube and would get you taken down and sometimes banned. I'd be worried about Gemini seeing that content as a violation and triggering some sort of automated action against you. Also worth considering that even if stuff you listen to is fine now, it might not be the next time they decide to change the rules about what speech is allowed and what isn't.

Something like Sponsorblock for non youtube videos would be great. Another killer feature for me are automatic chapters if the podcast doesn't provide any