4.8b words on English Wikipedia. Knowledge cutoff of 6 months. A valid use case is to search across Wikipedia and ground your
answers.
Trivially proves that RAG is still needed.
This is only for the small model. The medium model is still at 1M (like Gemini 2.5)
Even if we could get the mid models to 10M, that's still a medium-sized repo at best. Repos size growth will also accelerate as LLMs generate more code. There's no way to catch up.
4.8b words on English Wikipedia. Knowledge cutoff of 6 months. A valid use case is to search across Wikipedia and ground your answers. Trivially proves that RAG is still needed.
RAG still has lots of benefits for anyone paying per input token (e.g. over APIs).
Not to mention latency
And grounding for the model. Smaller models with tend to hallucinate a little less (anecdotally).
This is only for the small model. The medium model is still at 1M (like Gemini 2.5)
Even if we could get the mid models to 10M, that's still a medium-sized repo at best. Repos size growth will also accelerate as LLMs generate more code. There's no way to catch up.
RAG gets bigger as everyone else gets bigger. Flooding prompts with garbage is not a sound strategy...