Comment by lukeschlather

6 months ago

LLMs improve significantly on state of the art OCR. LLMs can do contextual analysis. If I were transcribing these by hand, I would probably feed them through OCR + an LLM, then ask an LLM to compare my transcription to its transcription and comment on any discrepancies. I wouldn't be surprised if I offered minimal improvement over just having the LLM do it though.

5 comments

lukeschlather

sandworm101 6 months ago

Why assume that OCR does not involve context? OCR systems regularly use context. It doesnt require an LLM for a machine reading medical forms to generate and use a list of the hundred most common drugs appearing in a paticular place on a specific form. And an OCR reading envelopes can be directed to prefer numbers or letters depending on what it expects.

Even if LLMs can push a 99.9% accuracy to 99.99, at least an OCR-based system can be audited. Ask an OCR vendor why the machine confused "Vancouver WA" and "Vancouver CA" and one can get a solid answer based in repeated testing. Ask an LLM vendor why and, at best, you'll get a shrug and some line citing how much better they were in all the other situations.

iterance 6 months ago

Are you guessing, or are there results somewhere that demonstrate how LLMs improve OCR in practical applications?

Modified3019 6 months ago
Someone linked this above
https://trustdecision.com/resources/blog/revolutionizing-ocr...
> Our internal tests reveal a leap in accuracy from 98.97% to 99.56%, while customer test sets have shown an increase from 95.61% to 98.02%. In some cases where the document photos are unclear or poorly formatted, the accuracy could be improved by over 20% to 30%.
While a small percentage increase, when applied to massive amounts of text it’s a big deal.
- imtringued 6 months ago
  
  It's not a small percentage. The moment you OCR a book, you'll end up with hundreds to thousands of errors.