Comment by Tsarp

2 days ago

After a lot of experimentation, the only thing that worked in a chat style application is to pass maybe the last 4-5 messages (ideally the entire conversation history) and ask an LLM to summarize the question in the context of the conversation.

Without that it often failed when users asked something like ("Can you expand point 2? , Give a detailed example of the above").

Current implementation(I have 3 indexes) is to provide Query + Past messages and ask an LLM to break it down into Overall ask: BM25 optimized question: Keywords: Semantic optimized question:

Perform RAG + Rerank and pass the top N passages after this along with the Overall ask in the second LLM call.