Comment by themanmaran

4 months ago

> OmniAI benchmark that's also referenced here wasn't updated with new models since February 2025. I assume that's because general purpose LLMs have gotten better at OCR than their own OCR product.

Benchmark author here. No, just pivoted away from OCR API as a product! Still use our API internally but have been lazy about updating benchmarks.

Gemini is definitely the best model for OCR. But it has a really high rate of "recitation" errors. Where it will determine the output token is too close to its training data and cut it off. Something like 10% of the time from our testing. Also it has this hilarious hallucination when you have a blank page in the document mix and it just makes up new info.

OpenAI is OK. GPT5 wasn't any better than 4o or 4.1. Main issues were: dropping content like headers/footers, loses it's mind on sideways pages, and will frequently refuse to read things like ID documents, health care forms, or things it judges to have too much PII.

0 comments

themanmaran

No comments yet

Contribute on Hacker News ↗