Comment by kapitalx
17 days ago
I'm the founder of https://doctly.ai, also pdf extraction.
The hallucination in LLM extraction is much more subtle as it will rewrite full sentences sometimes. It is much harder to spot when reading the document and sounds very plausible.
We're currently working on a version where we send the document to two different LLMs, and use a 3rd if they don't match to increase confidence. That way you have the option of trading compute and cost for accuracy.
>We're currently working on a version where we send the document to two different LLMs, and use a 3rd if they don't match to increase confidence.
I’m interested to hear more about the validation process here. In my limited experience, I’ve sent the same “document” to multiple LLMs and gotten differing results. But sometimes the “right” answer was in the minority of responses. But over a large sample (same general intent of document, but very different possible formats of the information within), there was no definitive winner. We’re still working on this.
What if you use a different prompt to check the result, did this work? I am thinking to use this approach, but now I think maybe it is better to use two different LLM like you do.