Comment by scottydelta
17 days ago
This is what I am trying to figure out how to solve.
My problem statement is:
- Injest PDFs, summarize, and extract important information.
- Have some way to overlay the extracted information on the pdf in the UI.
- User can provide feedback on the overlaid info by accepting or rejecting the highlights as useful or not.
- This info goes back in to the model for reinforced learning.
Hoping to find something that can make this more manageable.
Most PDF parsers give you coordinate data (bounding boxes) for extracted text. Use these to draw highlights over your PDF viewer - users can then click the highlights to verify if the extraction was correct.
The tricky part is maintaining a mapping between your LLM extractions and these coordinates.
One way to do it would be with two LLM passes:
Not the cheapest approach since you're hitting the API twice, but it's straightforward!
Here's a PR thats not accepted yet for some reason that seems to be having some success with the bounding boxes
https://github.com/getomni-ai/zerox/pull/44
Related to
https://github.com/getomni-ai/zerox/issues/7
Have you tried cursor or replit for this?