Comment by miles
9 days ago
As this project is geared toward "early modern prints", any recommendations for the best OCR/LLM solution for poor-quality typed manuscripts?
9 days ago
As this project is geared toward "early modern prints", any recommendations for the best OCR/LLM solution for poor-quality typed manuscripts?
I was playing with PaddleOCR a while ago, it seemed to work quite well. It seems to be geared to Chinese, but it also works with other languages in my experience.
I created a wrapper of PaddleOCR: https://github.com/gutenye/ocr
If cloud options are ok, Gemini has been getting a good rep lately, as it’s good at OCR and cheaper than the rest.
If we’re doing recommendations I’d like to know what everyone is using for handwriting
Researchers use Transkribus
You could try MiniCPM 2.6.