> The traditional way to do RAG is to find information relevant to a query - and then incorporate it into the LLM prompt together with the question we want it to answer.
Technically this is incorrect. The original RAG paper used a seq2seq generator (BART) and involved two methods: RAG sequence and RAG token.
RAG sequence used the same fixed documents and appended them to the input query (note, this is different from a decoder-only model). RAG token generates each token based on a different document.
I only nitpick this because if someone is going to invent new fancy-sounding variants of RAG they should at least get the basics right.
> The traditional way to do RAG is to find information relevant to a query - and then incorporate it into the LLM prompt together with the question we want it to answer.
Technically this is incorrect. The original RAG paper used a seq2seq generator (BART) and involved two methods: RAG sequence and RAG token.
RAG sequence used the same fixed documents and appended them to the input query (note, this is different from a decoder-only model). RAG token generates each token based on a different document.
I only nitpick this because if someone is going to invent new fancy-sounding variants of RAG they should at least get the basics right.
Traditional does not equate original. The original technique was never used widely and cannot be called the traditional way.