Comment by Valk3_

6 months ago

Sorry for my lack of knowledge, but I've been wondering what if you ask a question to the RAG, where the answer to the question is not close in embedding space to the embedded question? Will that not limit the quality of the result? Or how does a RAG handle that? I guess maybe the multi-turn convo you mentioned helps in this regard?

The way I see RAG is it's basically some sort of semantic search, where the query needs to be similar to whatever you are searching for in the embedding space order to get good results.

2 comments

Valk3_

yencabulator 6 months ago

I think the trick is called "query expansion". You use an LLM to rewrite the query into a more verbose form, which can also include text from the chat context, and then you use that as the basis for the RAG lookup. Basically you use an LLM to give the RAG a better chance of having the query be similar to the resources.

Valk3_ 6 months ago

Thanks for the answer! I think you are right, I've also heard of HYDE (Hypothetical answer generation), that makes an LLM encode a guess as the answer into the query, which may also improve the results.