← Back to context

Comment by piker

3 months ago

This looks really cool for prototyping and playing around.

It seems to me though if one is building a modern application that needs to get image segmentation and/or text recognition right there are better APIs available than natural language? It seems like a lot of effort to make a production-scale CV application to weigh it down with all of an LLM’s shortcomings. Not a field I’m familiar with but I would assume that this doesn’t produce state of the art results—that would change the analysis.

As a hobby photographer, I organise everything for speedy retrieval but this would be amazing to search my collection.

Imagine you build an image segmentation model for a e.g. specific industrial application.

With this LLM approach you can at least create your training data from the raw images with natural language.