Comment by jascha_eng
9 months ago
Other than proprietary models, what is better than it today? Just asking in case I ever need OCR and don't want to pay the cloud providers for it :D
9 months ago
Other than proprietary models, what is better than it today? Just asking in case I ever need OCR and don't want to pay the cloud providers for it :D
checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practical
multimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2