Comment by EarlyOom

2 years ago

We're trying to do something similar with VLM-1 https://vlm-docs.nos.run/guides/guide-pdf-presentations. We've found that a lot of the peculiarities of LLMs for text parsing (hallucinations etc.) can be avoided with structured output that restricts everything to a known schema/output range while constraining the number of output tokens required.

0 comments