Comment by zby
7 months ago
Traditional FTS returns the whole document - people take over from that point and locate the interesting content there. The problem with RAG is that it does not follow that procedure - it tries to find the interesting chunk in one step. Even though since ReAct we know that LLMs could follow the same procedure as humans.
But we need an iterative RAG anyway: https://zzbbyy.substack.com/p/why-iterative-thinking-is-cruc...
For my application we do a land-and-expand strategy, where we use a mix of BM25 and semantic search to find a chunk, but before showing it to the LLM we then expand to include everything on that page.
It works pretty well. It might benefit from including some material on the page prior and after, but it mostly solves the "isolated chunk" problem.