Comment by asdev
2 days ago
I feel like tool calling killed RAG, however you have less control over how the retrieved data is injected in the context.
2 days ago
I feel like tool calling killed RAG, however you have less control over how the retrieved data is injected in the context.
Search tool calling is RAG. Maybe we should call it a "RAG Agent" to be more en vogue heh. But RAG is not just similarity search on embeddings in vector DBs. RAG is any type of a retrieval + context injection step prior to inference.
Heck, the RAG Agent could run cosign diff on your vector db in addition to grep, FTS queries, KB api calls, whatever, to do wide recall (candidate generation) then rerank (relevance prioritization) all the results.
You are probably correct that for most use cases search tool calling makes more practical sense than embeddings similarity search to power RAG.
> could run cosign diff on your vector db
or maybe even "cosine similarity"
word ;)
Tool calling complements RAG. You build a full scale RAG (embedding, reranker, create prompt, get output from LLM) and hook that to a tool another agent can see. That combines both their power.
How would you use tool-calling to filter through millions of documents? You need some search functionality, whether old-school search or embedding search. If you have only thousands of documents, then sure, you don't need search, as you can feed them all to the LLM.
I haven’t built either system but it seems clear that tool calling will be ‘O(num_targets * O(search tool))’, while RAG will be ‘O(embed_query * num_targets)’.
RAG looks linear (constant per lookup) while tools look polynomial. And tools will possibly fill up the limited LLM context too.
You give the LLM search tools.
That's missing the point. You are hiding the search behind the tool, but it's still search. Whether you use a tool or a hardcoded workflow is irrelevant.