Comment by Satam
7 months ago
RAG feels hacky to me. We’re coming up with these pseudo-technical solutions to help but really they should be solved at the level of the model by researchers. Until this is solved natively, the attempts will be hacky duct-taped solutions.
I've described it this way to my colleagues:
RAG is a bit like having a pretty smart person take an open book test on a subject they are not an expert in. If your book has a good chapter layout and index, you probably do an ok job trying to find relevant information, quickly read it, and try to come up with an answer. But your not going to be able to test for a deep understanding of the material. This person is going to struggle if each chapter/concept builds on the previous concept, as you can't just look up something in Chapter 10 and be able to understand it without understanding Chapter 1-9.
Fine-tuning is a bit more like having someone go off and do a phd and specialize in a specific area. They get a much deeper understanding for the problem space and can conceptualize at a different level.
What you said about RAG makes sense, but my understanding is that fine-tuning is actually not very good at getting deeper understanding out of LLMs. It's more useful for teaching general instructions like output format rather than teaching deep concepts like a new domain of science.
This is true if you don't know what you're doing, so it is good advice for the vast majority.
Fine tuning is just training. You can completely change the model if you want make learn anything you want.
But there are MANY challenges in doing so.
3 replies →
That's so vague I can't tell what you're suggesting. What specifically do you think needs solving at the model level? What should work differently?
There’s probably lack of cpabalities on multiple fronts. RAG might have the right general idea but currently the retrieval seems to be too seperated from the model itself. I don’t know how our brains do it, but retrieval looks to be more integrated there.
Models currently also have no way to update themselves with new info besides us putting data into their context window. They don’t learn after the initial training. It seems if they could, say, read documentation and internalize it, the need for RAG or even large context windows would decrease. Humans somehow are able to build understanding of extensive topics with what feels to be a much shorter context-window.
Don't forget the importance of data privacy. Updating a model with fresh information makes that information available to ALL users of that model. This often isn't what you want - you can run RAG against a user's private email to answer just their queries, without making that email "baked in" to the model.
3 replies →
I guess it's because people are not using tools enough yet. In my tests giving LLM access to tools for retrieval works much better then trying to guess what the RAG would need to answer. ie. LLM decides if it has all of the necessary information to answer the question. If not, let it search for it. If it still fails than let it search more :D
2 replies →
Our brains aren't even doing it also. We can't memorise all the things in the World. For us a library/Google Search is what RAG is for an LLM.
I can answer questions off the cuff based on the weights of the neural network in my head. If I really wanted to get the right answers I would do "RAG" in the sense of looking up answers on the web or at the library and summarizing them.
For instance I have a policy that I try hard not to say anything like "most people think that..." without providing links because I work at an archive of public opinion data and if it gets out that one of our people was spouting false information about our domain, even if we weren't advertising the affiliation, that would look bad.
I think he is saying we should be making fine-tuning or otherwise similar model altering methods easier rather than messing with bolt-on solutions like RAG
Those are being worked on and RAG is the ducktape solution until they become available
What about fresh data like an extremely relevant news headline that was published 10 minutes ago? Private data that I don’t want stored offsite but am okay trusting an enterprise no log api? Providing realtime context to LLMs isn’t “hacky”, model intelligence and RAG can complement each other and make advancements in tandem
I don't think the parents idea was to bake all information into the model, just that current RAG feels cumbersome to use (but then again, so do most things AI right now) and information access should be intrinsic part of the model.
Is there a specific shortcoming of the model that could be improved, or are we simply seeking better APIs?
One of my favorite cases is sports chat. I'd expect ChatGPT to be able to talk about sports legends but not be able to talk about a game that happened last weekend. Copilot usually does a good job because it can look up the game on Bing and them summarize but the other day i asked it "What happened last week in the NFL" and it told me about a Buffalo Bills game from last year (did it know I was in the Bills geography?)
Some kind of incremental fine tuning is probably necessary to keep a model like ChatGPT up to date but I can't picture it happening each time something happens in the news.
For the current game, it seems solvable by providing it the Boxscore and the radio commentary as context, perhaps with some additional data derived from recent games and news.
I think you’d get a close approximation of speaking with someone who was watching the game with you.
Fwiw, I used to think this way too but LLMs are more RAG-like internally than we initially realised. Attention is all you need ~= RAG is a big attention mechanism. Models have reverse curse, memorisation issues etc. I personally think of LLMs as a kind of decomposed RAG. Check out DeepMind’s RETRO paper for an even closer integration.
I guess you can imagine an LLM that contains all information there is - but it would have to be at least as big as all information there is or it would have to hallucinate. And also you Not to mention that it seems that you would also require it to learn everything immediately. I don't see any realistic way to reach that goal.
To reach their potential LLMs need to know how to use external sources.
Update: After some more thinking - if you required it to know information about itself - then this would lead to some paradox - I am sure.
A CL agent is next generation AI.
When CL is properly implemented in an LLM agent format, most of these systems vanish.
The set of techniques for retrieval is immature, but it's important to note that just relying on model context or few-shot prompting has many drawbacks. Perhaps the most important is that retrieval as a task should not rely on generative outputs.
It's also subject to significantly more hallucination when the knowledge is baked into the model, vs being injected into the context at runtime.
The biggest problem with RAG is that the bottleneck for your product is now the RAG (i.e, results are only as good as what your vector store sends to the LLM). This is a step backwards.
Source: built a few products using RAG+LLM products.