← Back to context

Comment by pilooch

14 days ago

It's good and useful to see empirical analyses like this. I use open & custom VLMs a lot. The point of VLMs is that OCR is not needed anymore: it's intrinsic to the model. For instance at work we've developed a family vision-based RAG, and it's performance is twice that of a text-based one. The point I'd like to make here is that OCR is an intermediate step that is not explicitly needed anymore, un many cases. My hunch is that pure OCR will go away.