Comment by antirez

10 months ago

I propose a different technique:

- Use a large context LLM.

- Segment documents to 25% of context or alike.

- With RAG, retrieve fragments from all the documents, they do a first pass semantic re-ranking like this, sending to the LLM:

I have a set of documents I can show you to reply the user question "$QUESTION". Please tell me from the title and best matching fragments what document IDs you want to see to better reply:

[Document ID 0]: "Some title / synopsis. From page 100 to 200"

... best matching fragment of document 0...

... second best fragment ...

[Document ID 1]: "Some title / synopsis. From page 200 to 300"

... fragmnets ...

LLM output: show me 3, 5, 13.

New query, with attached the full documents for 75% of context window.

"Based on the attached documents in this chat, reply to $QUESTION".

2 comments

antirez

datadrivenangel 10 months ago

Slow/expensive. Good idea otherwise.

danielmarkbruce 10 months ago

but inference time compute is the new hotness.