Comment by okdood64

5 months ago

> is going to kill a lot of RAG use cases.

I have a high level understanding of LLMs and am a generalist software engineer.

Can you elaborate on how exactly these insanely large (and now cheap) context windows will kill a lot of RAG use cases?

1 comment

okdood64

If a model has 4K input context and you have a document or code base with 40K, then you have to split it up. The system prompt, user prompt, and output token budget all eat into this. You might need hundreds of small pieces, which typically end up in a vector database for RAG retrieval.

With a million tokens you can shove several short books into the prompt and just skip all that. That’s an entire small-ish codebase.

A colleague used a HTML dump of every config and config policy from a Windows network, pasted it into Gemini and started asking questions. It’s just that easy now!