Comment by anon373839

8 months ago

It’s a question of scale. When a traditional OCR system makes an error, it’s confined to a relatively small part of the overall text. (Think of “Plastics” becoming “PIastics”.) When a LLM hallucinates, there is no limit to how much text can be made up. Entire sentences can be rewritten because the model thinks they’re more plausible than the sentences that were actually printed. And because the bias is always toward plausibility, it’s an especially insidious problem.

1 comment

anon373839

themanmaran 8 months ago

It's a bit of a pick your poison situation. You're right that traditional OCR mistakes are usually easy to catch (except when you get $30.28 vs $80.23). Compared to LLM hallucinations that are always plausibly correct.

But on the flip side, layout is often times the biggest determinant of accuracy, and that's something LLMs do a way better job on. It doesn't matter if you have 100% accurate text from a table, but all that text is balled into one big paragraph.

Also the "pick the most plausible" approach is a blessing and a curse. A good example is the handwritten form here [1]. GPT 4o gets the all the email addresses correct because it can reasonably guess these people are all from the same company. Whereas AWS treats them all independently and returns three different emails.

[1] https://getomni.ai/ocr-demo