Comment by antirez
4 days ago
I propose a different technique:
- Use a large context LLM.
- Segment documents to 25% of context or alike.
- With RAG, retrieve fragments from all the documents, they do a first pass semantic re-ranking like this, sending to the LLM:
I have a set of documents I can show you to reply the user question "$QUESTION". Please tell me from the title and best matching fragments what document IDs you want to see to better reply:
[Document ID 0]: "Some title / synopsis. From page 100 to 200"
... best matching fragment of document 0...
... second best fragment ...
[Document ID 1]: "Some title / synopsis. From page 200 to 300"
... fragmnets ...
LLM output: show me 3, 5, 13.
New query, with attached the full documents for 75% of context window.
"Based on the attached documents in this chat, reply to $QUESTION".
Slow/expensive. Good idea otherwise.
but inference time compute is the new hotness.