← Back to context

Comment by raincole

3 months ago

If you can accept that the machine just make up what it doesn't recognize instead of saying "I don't know," then yes it's solved.

(I'm not being snarky. It's acceptable in some cases.)

But this was very much the case with existing OCR software as well? I guess the LLMs will end up making up plausible looking text instead of text riddled with errors, which makes it much harder to catch the mistakes, in fairness

Just checked it with Gemini 2.5 Flash. Instructing it to mark low-confidence words seems to work OK(ish).

These days it does just that, it'll say null or whatever if you give it the option. When it does make it up, it tends to be limitation of the image qualify ( max dpi).

Blotchy text and specific typeface make 6's look like 8's, even to the non-discerning eye, a human would think it's an 8, zoom in and see it's a 6.

Google's image quality on uploads is still streets ahead of openai for instance btw.

Do any LLM OCRs give bounding boxes anyway? Per character and per block.