← Back to context

Comment by aidenn0

8 days ago

Tesseract wildly outperforms any VLM I've tried (as of November 2024) for clean scans of machine-printed text. True, this is the best case for Tesseract, but by "wildly outperforms" I mean: given a page that Tesseract had a few errors on, the VLM misread the text everywhere that Tesseract did, plus more.

On top of that, the linked article suggests that Gemini 2.0 can't give meaningful bounding boxes for the text it OCRs, which further limits the places in which it can be used.

I strongly suspect that traditional OCR systems will become obsolete, but we aren't there yet.