Comment by Nextgrid
25 days ago
Yeah my point was to download videos in bulk and scan them to then mark these segments in Sponsorblock.
LLMs failed to produce any kind of performant solution.
25 days ago
Yeah my point was to download videos in bulk and scan them to then mark these segments in Sponsorblock.
LLMs failed to produce any kind of performant solution.
Generative models feel like the wrong abstraction here. I would try extracting keyframes and running them through CLIP or SigLIP to get embeddings. Then you can just do vector search to match the segments. Much lighter on compute.
I was talking to get LLMs to write the code or come up with an approach. I agree that the resulting solution does not need any kind of LLMs or even ML.