Comment by mattnewton
12 hours ago
Eh, I guess we’re looked at different PDFs and models. Gemini 2.5 flash is very good, and Gemini 2.0 and Claude 3.7 were passable at parsing out complicated tables in image chunks, and we did have a fairly small prompt that worked >90% of cases. Where we had failures they were almost always in asking the model to do something infeasible (like parse a table where the header was on a previous, not provided page).
If you have a better way to parse PDFs using opencv or whatever, please provide this service and people will buy it for their RAG chat bots or to train vlms.
No comments yet
Contribute on Hacker News ↗