Comment by vintermann

7 days ago

> OCR is already at the point where adding an LLM at the end is counterproductive

That's mass OCR on printed documents. On handwritten documents, LLMs help. There are tons of documents that even top human experts can't read without context and domain language. Printed documents are intended to be readable character by character. Often the only thing a handwriting author intends is to remind himself of what he was thinking when he wrote it.

Also, what is the downstream tasks? What do you need character level accuracy for? In my experience, it's often for indexing and search. I believe LLMs have a higher ceiling there, and can in principle (if not in practice, yet) find data and answer questions about a text better than straightforward indexing or search can. I can't count the number of times I've e.g. missed a child in genealogy because I didn't think of searching the (fully and usually correctly) indexed data for some spelling or naming variant.

I am working with printed documents. Maybe LLMs currently make a difference with handwriting recognition. I wasn't directly responding to that. It's outside the little bit that I know, and I didn't even think of it as "OCR".

I'm not saying that I need high accuracy (though I do), I'm saying that the current accuracy (and clarifying that this is specifically for printed text) is already very high. Part of the reason it's so high is because the old complicated character-by-character classifiers have already been replaced with neural networks that process entire lines at a time. It's already moving in the direction you're saying we need.