Comment by rvnx
9 months ago
One good self-hosted OCR is PaddleOCR, https://github.com/PaddlePaddle/PaddleOCR
Beats everything else, truly international and multi-lingual, including Chinese (as it is made in China)
9 months ago
One good self-hosted OCR is PaddleOCR, https://github.com/PaddlePaddle/PaddleOCR
Beats everything else, truly international and multi-lingual, including Chinese (as it is made in China)
It is insanely fast compared alternatives and has really high accuracy even on new tasks without any training.
Their PaddleLayout models are also miles ahead compared to LayoutParser or TableTransformers in both inference speed and output quality
Why is it “self-hosted” and not “library + desktop/cli app”? “Self-hosted” implies it need a full web stack and rdbms backend?
It was just to show that you can run it locally, in opposition to "cloud APIs" referred in the thread, but you are right, the more correct term is local
Thanks. I had clicked the readme but I was on my phone and wasn’t able to translate it to English to see if it was a web app.
I think that's Baidu. I remember https://github.com/PaddlePaddle/ from when Ernie 3.0 was released back when text encoder models weren't forgotten with the progress of decoder-only ones.
Holy Crap! You were right about PaddleOCR. My personal benchmark for OCR tools is to submit several random pages from the first edition Moody's Manual for Railroads.
https://imgur.com/r2RsJeH
The reason I use it is to test whether it's just analyzing letter-by-letter (even if they claim it does more) or if it's actually scanning the letter/word in its context. If it's letter-by-letter, I get hilariously awful results.
Sure, it got things wrong. But it also figured out some things even I couldn't decipher.