Comment by __rito__

15 days ago

I was just trying a bunch of models for OCR. I only have 4 GB of VRAM in my personal machine.

My goal was to run an OCR model locally and extract text from scanned PDFs.

Many models could not even be run. Among those that did run, thanks to Ollama, provided very poor experience. Like llava-llama3, phi3.5 vision, etc.

What worked really well, but still not up to the mark- Surya [0].

It works perfectly on screenshots from true text PDFs, but not from scanned PDFs. Also has much better performance for English than Indian languages.

[0]: https://github.com/VikParuchuri/surya

yup, the models you tried out require a lot of work to be able to run efficiently. additionally for actual decent ocr it’ll require very high quality document datasets (of PDF/excel/pptx). scanned documents are especially hard and cause a lot of issues for LLMs, which start making up info a lot of the time.