← Back to context

Comment by storystarling

25 days ago

Generative models feel like the wrong abstraction here. I would try extracting keyframes and running them through CLIP or SigLIP to get embeddings. Then you can just do vector search to match the segments. Much lighter on compute.

I was talking to get LLMs to write the code or come up with an approach. I agree that the resulting solution does not need any kind of LLMs or even ML.