Comment by mehulashah
16 days ago
(Disclosure, CEO of Aryn (https://aryn.ai/) here)
Good post. VLM models are improving and Gemini 2.0 definitely changes the doc prep and ingestion pipeline across the board.
What we're finding as we work with enterprise customers:
1. Attribution is super important, and VLMs are there yet. Combining them with layout analysis makes for a winning combo.
2. VLMs are great at prompt-based extraction, but if you have document automation and you don't know where in tables you'll be searching or need to reproduce faithfully -- then precise table extraction is important.
3. VLMs will continue to get better, but the price points are a result of economies of scale that document parsing vendors don't get. On the flip side, document parsing vendors have deployment models that Gemini can't reach.
No comments yet
Contribute on Hacker News ↗