Comment by lyjackal
17 days ago
If the end goal is just rag or search over the pdfs, seems like ColPali based embedding search would be a good alternative here. Don’t process the PDFs, instead just search their image embedding directly. From what I understand, you also get a sort of attention as to what part of the image is being activated by the search.
No comments yet
Contribute on Hacker News ↗