Comment by prng2021

10 months ago

It was never “solved” unless you can point me to OCR software that is 100% accurate. You can take 5 seconds to google “ocr with llm” and find tons of articles explaining how LLMs can enhance OCR. Here’s an example:

https://trustdecision.com/resources/blog/revolutionizing-ocr...

14 comments

prng2021

sandworm101 10 months ago

By that standard, no problem has ever been solved by anyone. I prefer to believe that a great many everyday tech issues were in fact tackled and solved in the past by people who had never even heard of LLMs. So too many things were done in finance long before blockchains solved everything for us.

asveikau 10 months ago

OCR is very bad.
As an example look at subtitle rips for DVD and Blu-ray. The discs store them as images of rendered computer text. A popular format for rippers is SRT, where it will be stored as utf-8 and rendered by the player. So when you rip subtitles, there's an OCR step.
These are computer rendered text in a small handful of fonts. And decent OCR still chokes on it often.
prng2021 10 months ago

From the article I linked:
“Our internal tests reveal a leap in accuracy from 98.97% to 99.56%, while customer test sets have shown an increase from 95.61% to 98.02%. In some cases where the document photos are unclear or poorly formatted, the accuracy could be improved by over 20% to 30%.”
flir 10 months ago

In my experience the chatbots have bumped transcription accuracy quite a bit. (Of course, it's possible I just don't have access to the best-in-class OCR software I should be comparing against).
(I always go over the transcript by hand, but I'd have to do that with OCR anyway).
VWWHFSfQ 10 months ago
OCR is not perfect. And therefore it is not "solved".
- Dylan16807 10 months ago
  
  That definition, solved=perfect, is not what sandworm meant and it's an irrelevant definition to this conversation because it's an impossible standard.
  Insisting we switch to that definition is just being unproductive and unhelpful. And it's pure semantics because you know what they meant.
  
  4 replies →

tjwebbnorfolk 10 months ago

point me to handwriting that is 100% legible...

If 100% is your standard, good luck solving anything ever.

elmomle 10 months ago
Most handwriting is legible to its owner. This would indicate that there is enough consistency within a person's writing style to differentiate letters, etc., even if certain assumptions about resemblance to any standard may not hold. I wonder if there are modern OCR methods that incorporate old code-breaking techniques like frequency analysis.
- Boldened15 10 months ago
  
  > Most handwriting is legible to its owner.
  Not necessarily, I'd be surprised if I could fully understand my old handwritten notes from when I was in school (years ago), since I've always had messy handwriting and no longer have the context in each subject matter to guess.
  LLMs could help in some of those cases, since it would have knowledge of history/chemistry/etc. and could fill in the blanks better than I could at this point. Though the hallucinations would no doubt outweigh it.
manquer 10 months ago

I think OP is saying there is always scope for improvement until it is 100% not that 100% or bust .