Comment by rvnx

9 months ago

One good self-hosted OCR is PaddleOCR, https://github.com/PaddlePaddle/PaddleOCR

Beats everything else, truly international and multi-lingual, including Chinese (as it is made in China)

It is insanely fast compared alternatives and has really high accuracy even on new tasks without any training.

Their PaddleLayout models are also miles ahead compared to LayoutParser or TableTransformers in both inference speed and output quality

Why is it “self-hosted” and not “library + desktop/cli app”? “Self-hosted” implies it need a full web stack and rdbms backend?

  • It was just to show that you can run it locally, in opposition to "cloud APIs" referred in the thread, but you are right, the more correct term is local

    • Thanks. I had clicked the readme but I was on my phone and wasn’t able to translate it to English to see if it was a web app.

Holy Crap! You were right about PaddleOCR. My personal benchmark for OCR tools is to submit several random pages from the first edition Moody's Manual for Railroads.

https://imgur.com/r2RsJeH

The reason I use it is to test whether it's just analyzing letter-by-letter (even if they claim it does more) or if it's actually scanning the letter/word in its context. If it's letter-by-letter, I get hilariously awful results.

Sure, it got things wrong. But it also figured out some things even I couldn't decipher.