Comment by cccybernetic
17 days ago
Most PDF parsers give you coordinate data (bounding boxes) for extracted text. Use these to draw highlights over your PDF viewer - users can then click the highlights to verify if the extraction was correct.
The tricky part is maintaining a mapping between your LLM extractions and these coordinates.
One way to do it would be with two LLM passes:
1. First pass: Extract all important information from the PDF
2. Second pass: "Hey LLM, find where each extraction appears in these bounded text chunks"
Not the cheapest approach since you're hitting the API twice, but it's straightforward!
Here's a PR thats not accepted yet for some reason that seems to be having some success with the bounding boxes
https://github.com/getomni-ai/zerox/pull/44
Related to
https://github.com/getomni-ai/zerox/issues/7