Comment by thiht

5 months ago

Or, depending on your use case, you do it in one step and ask an LLM to extract data from a PDF.

What you describe is obviously better and more robust by a lot, but the LLM only approach is not "wrong". It’s simple, fast, easy to setup and understand, and it works. With less accuracy but it does work. Depending on the constraints, development budget and load it’s a perfectly acceptable solution.

We did this to handle 2000 documents per month and are satisfied with the results. If we need to upgrade to something better in the future we will, but in the mean time, it’s done.

0 comments

thiht

No comments yet

Contribute on Hacker News ↗