← Back to context

Comment by gopher_space

8 days ago

I just wrapped up a test project[0] based on a comment from that post! My takeaway was that there are a lot of steps in the process you can farm out to cheaper, faster ML models.

For example, the slowest part of my pipeline is picture description since I need a LLM for that (and my project needs to run on low-end equipment). Locally I can spin up a tiny LLM and get one-word descriptions in a minute, but anything larger takes like 30. I might be able to only send sections I don't have the hardware to process.

It was a good into to ML models incorporating vision, and video is "just" another image pipeline, so it's been easy to look at e.g. facial recognition groupings like any document section.

[0] https://github.com/jnday/ocr_lol