Comment by rybosome

8 months ago

Well, even if we assume for a moment that we aren’t talking about non-public data…

Then RAG which serves up knowledge already in the model’s pretraining data is still useful, because it primes the model for the specific context with which you want to engage it. I maybe can see what you are saying, like why can’t the model just do a good job without being re-reminded? But even in that sense, any intelligence, artificial or otherwise, will do better given more context.

And that ignores the reality of data outside the model’s pretraining corpus, like every single business’ internal data.

It still makes sense to use external data storage for smaller local models. They just can't hold that much.