Comment by pilooch

1 year ago

It's good and useful to see empirical analyses like this. I use open & custom VLMs a lot. The point of VLMs is that OCR is not needed anymore: it's intrinsic to the model. For instance at work we've developed a family vision-based RAG, and it's performance is twice that of a text-based one. The point I'd like to make here is that OCR is an intermediate step that is not explicitly needed anymore, un many cases. My hunch is that pure OCR will go away.

0 comments

pilooch

No comments yet

Contribute on Hacker News ↗