Comment by inexcf
5 days ago
Got excited about an open-source tool doing this.
Alas, i am let down. It is an open-source tool creating the prompt for the OpenAI API and i can't go and send customer data to them.
I'm aware of https://github.com/clovaai/donut so i hoped this would be more like that.
You can self host OpenAPI compatible models with lmstudio and the like. I've used it with https://anythingllm.com/
Hi. I totally get the concern about sending data to OpenAI. Right now, Documind uses OpenAI's API just so people could quickly get started and see what it is like, but I’m open to adding options and contributions that would be better for privacy.
That sounds great.
I'd recommend checking out vision language models. They generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. I built a simple API over it if you want to try it out: https://github.com/DataFog/vlm-api
You might be able to use Ollama, which has a OpenAI compatible API.
Not without chaning the code (should be easy though)
https://github.com/DocumindHQ/documind/blob/d91121739df03867...