Comment by jdprgm
3 days ago
I actually have a WIP cross platform app that does exactly this. It is more generic around processing any audio/video with whisper and integrating with openai or local llm's for summarization and other things but I also added a podcast specific ad skipping feature (it's not as perfect as something manual like sponsorblock for youtube yet but i'd say it's about 85% accuracy at the moment dependent on the models used)
Not to hijack's OP great work, but when you say 85% you mean true positives? How about the false positives?
My prompting is conservative to err on the side of playing an ad if there is a chance it might be part of the actual content, not really getting false positives at all yet. That being said while still in development I haven't reached the stage of running on a huge collection of podcasts to get more representative statistics.
I think the accuracy of my prompt/llm is also ~85%. I've got a collection 2500+ podcast episode transcripts (English language) with ads I'm going to try and analyze shortly to find out if I'm missing any ads, or tagging some falsely.