Comment by farseer

4 months ago

How good is this compared to most commercial OCR software?

6 comments

farseer

Any vision model is better than commercial OCR software.

Etheryte 4 months ago
I'm not really sure if that's an accurate summary of the state of the art, [0] is a better overview. In short, SOTA multi-modal LLMs are the best option for handwriting, nearly anything is good at printed text, for printed media, specialty models from hyperscalers are slightly better than multi-modal LLMs.
[0] https://research.aimultiple.com/ocr-accuracy/
- ozim 4 months ago
  
  I see it confirms what I wrote state of art is “not using tessaract anymore” and I think bunch of commercial solutions are stuck with tessaract.
  
  1 reply →
dragonwriter 4 months ago

Since “commercial OCR software” includes VLM-based commercial offerings, that's clearly not correct.
szundi 4 months ago

[dead]