Comment by jiggawatts

15 days ago

If a model has 4K input context and you have a document or code base with 40K, then you have to split it up. The system prompt, user prompt, and output token budget all eat into this. You might need hundreds of small pieces, which typically end up in a vector database for RAG retrieval.

With a million tokens you can shove several short books into the prompt and just skip all that. That’s an entire small-ish codebase.

A colleague used a HTML dump of every config and config policy from a Windows network, pasted it into Gemini and started asking questions. It’s just that easy now!