Comment by 7thpower

5 months ago

I have not seen this answer so I’ll chime in:

There is a lot of enthusiasm around language models for OCR and I have found that generally they work well, however I have had much better results, especially if there are tables etc., by sending the raw page to the llm along with the ocrd page, and asking it transcribe from the image and validate words/character sequences against the ocr.

This largely solves for numbers and things being jumbled or hallucinated.

I recently tested llamaparse after trying it a year prior and was very impressed. You may be able to do your project on the free tier, and it will do a lot of this for you.

0 comments

7thpower

No comments yet

Contribute on Hacker News ↗