← Back to context

Comment by scottydelta

17 days ago

This is what I am trying to figure out how to solve.

My problem statement is:

- Injest PDFs, summarize, and extract important information.

- Have some way to overlay the extracted information on the pdf in the UI.

- User can provide feedback on the overlaid info by accepting or rejecting the highlights as useful or not.

- This info goes back in to the model for reinforced learning.

Hoping to find something that can make this more manageable.

Most PDF parsers give you coordinate data (bounding boxes) for extracted text. Use these to draw highlights over your PDF viewer - users can then click the highlights to verify if the extraction was correct.

The tricky part is maintaining a mapping between your LLM extractions and these coordinates.

One way to do it would be with two LLM passes:

  1. First pass: Extract all important information from the PDF
  2. Second pass: "Hey LLM, find where each extraction appears in these bounded text chunks"

Not the cheapest approach since you're hitting the API twice, but it's straightforward!