Comment by yccheok

3 days ago

Hi,

I’m currently building a Q&A chatbot and facing challenges in addressing the following scenario:

When a user asks:

"What do you mean in your previous statement?"

How does your framework handle retrieving the correct small subset of "raw knowledge" and integrating it into the LLM for a relevant response?

Without relying on external frameworks, I’ve struggled with this issue - https://www.reddit.com/r/LocalLLaMA/comments/1gtzdid/d_optim...

I’d love to know how your framework solves this and whether it can streamline the process.

Thank you!

After a lot of experimentation, the only thing that worked in a chat style application is to pass maybe the last 4-5 messages (ideally the entire conversation history) and ask an LLM to summarize the question in the context of the conversation.

Without that it often failed when users asked something like ("Can you expand point 2? , Give a detailed example of the above").

Current implementation(I have 3 indexes) is to provide Query + Past messages and ask an LLM to break it down into Overall ask: BM25 optimized question: Keywords: Semantic optimized question:

Perform RAG + Rerank and pass the top N passages after this along with the Overall ask in the second LLM call.

If the user asks such a question, your agent should not invoke the RAG at all, but simply answer from the history. You need to focus on your orchestration step.

Search for ReAct agents, can build using either LangGraph or Bedrock Agents.