Comment by langcss
1 year ago
This sort of approach always made more sense to me than RAG. I am less likely to try RAG than something that feeds the LLM what it actually needs. RAG is risky in providing piecemeal information that confuses the LLM.
The way I thought would work and like to try out is ask the LLM what info it wants next from an index of contents. Like a book. That index can be LLM generated or not. Then backtrack as you don't need that lookup in your dialogue any more and insert the result.
It won't work for everything but should work for many "small expert" cases and you then don't need a vector DB you just do prompts!
Cheap LLMs make this more viable perhaps than it used to be. Use an open source small LLM for the decision making then a quality open source or proprietary LLM for the chat or code gen.
It's still RAG, just the R in RAG is not vector-based anymore, no?
You’re right. Many people ppl e take a mental shortcut and assume that RAG is a vector DB search. Any kind of retrieval is retrieval. You can do keyword search. You can do a PageRank like query. You can sort content by date and send the most recent items to the LLM. It’s all retrieval. That is the R on Retrieval Augmented Generation.
What you describe sounds like Agetic RAG https://zzbbyy.substack.com/p/agentic-rag
> The traditional way to do RAG is to find information relevant to a query - and then incorporate it into the LLM prompt together with the question we want it to answer.
Technically this is incorrect. The original RAG paper used a seq2seq generator (BART) and involved two methods: RAG sequence and RAG token.
RAG sequence used the same fixed documents and appended them to the input query (note, this is different from a decoder-only model). RAG token generates each token based on a different document.
I only nitpick this because if someone is going to invent new fancy-sounding variants of RAG they should at least get the basics right.
1 reply →
Thanks!