Comment by RicoElectrico 9 months ago Yeah, Tesseract is barely production quality. 3 comments RicoElectrico Reply lyu07282 9 months ago yeah it was SOTA in 2006, 18 years ago jascha_eng 9 months ago Other than proprietary models, what is better than it today? Just asking in case I ever need OCR and don't want to pay the cloud providers for it :D lyu07282 9 months ago checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practicalmultimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2
lyu07282 9 months ago yeah it was SOTA in 2006, 18 years ago jascha_eng 9 months ago Other than proprietary models, what is better than it today? Just asking in case I ever need OCR and don't want to pay the cloud providers for it :D lyu07282 9 months ago checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practicalmultimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2
jascha_eng 9 months ago Other than proprietary models, what is better than it today? Just asking in case I ever need OCR and don't want to pay the cloud providers for it :D lyu07282 9 months ago checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practicalmultimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2
lyu07282 9 months ago checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practicalmultimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2
yeah it was SOTA in 2006, 18 years ago
Other than proprietary models, what is better than it today? Just asking in case I ever need OCR and don't want to pay the cloud providers for it :D
checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practical
multimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2