Comment by lukeschlather
8 hours ago
LLMs improve significantly on state of the art OCR. LLMs can do contextual analysis. If I were transcribing these by hand, I would probably feed them through OCR + an LLM, then ask an LLM to compare my transcription to its transcription and comment on any discrepancies. I wouldn't be surprised if I offered minimal improvement over just having the LLM do it though.
Are you guessing, or are there results somewhere that demonstrate how LLMs improve OCR in practical applications?
Someone linked this above
https://trustdecision.com/resources/blog/revolutionizing-ocr...
> Our internal tests reveal a leap in accuracy from 98.97% to 99.56%, while customer test sets have shown an increase from 95.61% to 98.02%. In some cases where the document photos are unclear or poorly formatted, the accuracy could be improved by over 20% to 30%.
While a small percentage increase, when applied to massive amounts of text it’s a big deal.
Why assume that OCR does not involve context? OCR systems regularly use context. It doesnt require an LLM for a machine reading medical forms to generate and use a list of the hundred most common drugs appearing in a paticular place on a specific form. And an OCR reading envelopes can be directed to prefer numbers or letters depending on what it expects.
Even if LLMs can push a 99.9% accuracy to 99.99, at least an OCR-based system can be audited. Ask an OCR vendor why the machine confused "Vancouver WA" and "Vancouver CA" and one can get a solid answer based in repeated testing. Ask an LLM vendor why and, at best, you'll get a shrug and some line citing how much better they were in all the other situations.