← Back to context

Comment by eamag

7 days ago

Love the name!

OCR was discussed here lately several times (https://github.com/Future-House/paper-qa?tab=readme-ov-file#... are using PyMuPDF. My experience with Tesseract is pretty sad, it's usually not good enough and modern LLMs are better.

Thanks, I'll check these links.

In my tests I found tesseract quite good for regular text documents. For other kinds of texts it's not great.

As for using models - there are some good small language models as well, and of course LLMs.

I sorta feel though that if one needs complex OCR, or a vision model for layout, one should opt for either a commercial solution that abstracts the deployment and GPU management, or bake ones own system.

For most use cases involving text documents though, my subjective opinion is that tesseract is sufficient.