← Back to context

Comment by eigenvalue

8 days ago

I think the current sweet-spot for speed/efficiency/accuracy is to use Tesseract in combination with an LLM to fix any errors and to improve formatting, as in my open source project which has been shared before as a Show HN:

https://github.com/Dicklesworthstone/llm_aided_ocr

This process also makes it extremely easy to tweak/customize simply by editing the English language prompt texts to prioritize aspects specific to your set of input documents.

What kind of accuracy have you reached with this pipeline of Tesseract+LLM? I imagine that there would be a hard limit as to what level the LLM could improve the OCR extract text from Tesseract, since its far from perfect itself.

Haven't seen many people mention it, but have just been using the PaddleOCR library on it's own and has been very good for me. Often achieving better quality/accuracy than some of the best V-LLM's, and generally much better quality than other open-source OCR models I've tried like Tesseract for example.

That being said, my use case is definitely focused primarily on digital text, so if you're working with handwritten text, take this with a grain of salt.

https://github.com/PaddlePaddle/PaddleOCR/blob/main/README_e...

https://huggingface.co/spaces/echo840/ocrbench-leaderboard

Have you used your project on classical languages like Latin / Ancient Greek / Hebrew etc? Will the LLM fall flat in those cases, or be able to help?

  • I haven’t, but I bet it would work pretty well, particularly if you tweaked the prompts to explain that it’s dealing with Ancient Greek or whatever and give a couple examples of how to handle things.