Comment by Nextgrid

25 days ago

Yeah my point was to download videos in bulk and scan them to then mark these segments in Sponsorblock.

LLMs failed to produce any kind of performant solution.

2 comments

Nextgrid

Generative models feel like the wrong abstraction here. I would try extracting keyframes and running them through CLIP or SigLIP to get embeddings. Then you can just do vector search to match the segments. Much lighter on compute.

Nextgrid 25 days ago

I was talking to get LLMs to write the code or come up with an approach. I agree that the resulting solution does not need any kind of LLMs or even ML.