Comment by vinothgopi

8 months ago

What is a VLM?

2 comments

vinothgopi

sidmo 7 months ago

VLMs are cool - they generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. Here's an open-source API demo I built if you want to try it out: https://github.com/DataFog/vlm-api