Comment by Nextgrid
25 days ago
It's not limited to Shorts, even normal longform videos have had this crap for years now. I hate it too - fortunately SponsorBlock can take care of this, they have optional categories you can enable beyond just sponsors, including the "hook".
I was looking into making an automatic detector for this kind of thing (basically detect if anything in the first ~30 seconds repeats itself later in the video, and if so mark it) but my DSP skills aren't up to the task (and turns out LLMs are useless for these kinds of novel tasks).
In my experience an LLM could probably handle this. And it's not so novel. They can make an image stitcher which is basically the same problem.
It would probably need to download the whole video first though, so I'm not sure it would work as an extension. And analysing all frames would be expensive upfront. (If you're using it interactively and waiting for a video to start playing.)
You might be able to get away with just looking for repetition in the audio.
Yeah my point was to download videos in bulk and scan them to then mark these segments in Sponsorblock.
LLMs failed to produce any kind of performant solution.
Generative models feel like the wrong abstraction here. I would try extracting keyframes and running them through CLIP or SigLIP to get embeddings. Then you can just do vector search to match the segments. Much lighter on compute.
1 reply →