Comment by 7thpower
6 days ago
I have not seen this answer so I’ll chime in:
There is a lot of enthusiasm around language models for OCR and I have found that generally they work well, however I have had much better results, especially if there are tables etc., by sending the raw page to the llm along with the ocrd page, and asking it transcribe from the image and validate words/character sequences against the ocr.
This largely solves for numbers and things being jumbled or hallucinated.
I recently tested llamaparse after trying it a year prior and was very impressed. You may be able to do your project on the free tier, and it will do a lot of this for you.
No comments yet
Contribute on Hacker News ↗