← Back to context

Comment by mehulashah

9 months ago

(Disclosure, CEO of Aryn (https://aryn.ai/) here)

Good post. VLM models are improving and Gemini 2.0 definitely changes the doc prep and ingestion pipeline across the board.

What we're finding as we work with enterprise customers:

1. Attribution is super important, and VLMs are there yet. Combining them with layout analysis makes for a winning combo.

2. VLMs are great at prompt-based extraction, but if you have document automation and you don't know where in tables you'll be searching or need to reproduce faithfully -- then precise table extraction is important.

3. VLMs will continue to get better, but the price points are a result of economies of scale that document parsing vendors don't get. On the flip side, document parsing vendors have deployment models that Gemini can't reach.