Comment by MikeNotThePope

7 hours ago

Is it ever useful to have a context window that full? I try to keep usage under 40%, or about 80k tokens, to avoid what Dex Horthy calls the dumb zone in his research-plan-implement approach. Works well for me so far.

No vibes allowed: https://youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ

I'd been on Codex for a while and with Codex 5.2 I:

1) No longer found the dumb zone

2) No longer feared compaction

Switching to Opus for stupid political reasons, I still have not had the dumb zone - but I'm back to disliking compaction events and so the smaller context window it has, has really hurt.

I hope they copy OpenAI's compaction magic soon, but I am also very excited to try the longer context window.

  • 1m context in OpenAI and Gemini is just marketing. Opus is the only model to provide real usable bug context.

    • I'm directly conveying my actual experience to you. I have tasks that fill up Opus context very quickly (at the 200k context) and which took MUCH longer to fill up Codex since 5.2 (which I think had 400k context at the time).

      This is direct comparison. I spent months subscribed to both of their $200/mo plans. I would try both and Opus always filled up fast while Codex continued working great. It's also direct experience that Codex continues working great post-compaction since 5.2.

      I don't know about Gemini but you're just wrong about Codex. And I say this as someone who hates reporting these facts because I'd like people to stop giving OpenAI money.

      7 replies →

    • Source? I ask because I use 500k+ context on these on a daily basis.

      Big refactorings guided by automated tests eat context window for breakfast.

      5 replies →

    • Codex high reasoning has been a legitimately excellent tool for generating feedback on every plan Claude opus thinking has created for me.

  • This is true.

    When I am using codex, compaction isn’t something I fear, it feels like you save your gaming progress and move on.

    For Claude Code compaction feels disastrous, also much longer

Thanks for the video.

His fix for "the dumb zone" is the RPI Framework:

● RESEARCH. Don't code yet. Let the agent scan the files first. Docs lie. Code doesn't.

● PLAN. The agent writes a detailed step-by-step plan. You review and approve the plan, not just the output. Dex calls this avoiding "outsourcing your thinking." The plan is where intent gets compressed before execution starts.

● IMPLEMENT. Execute in a fresh context window. The meta-principle he calls Frequent Intentional Compaction: don't let the chat run long. Ask the agent to summarize state, open a new chat with that summary, keep the model in the smart zone.

Yes. I've recently become a convert.

For me, it's less about being able to look back -800k tokens. It's about being able to flow a conversation for a lot longer without forcing compaction. Generally, I really only need the most recent ~50k tokens, but having the old context sitting around is helpful.

  • Also, when you hit compaction at 200k tokens, that was probably when things were just getting good. The plan was in its final stage. The context had the hard-fought nuances discovered in the final moment. Or the agent just discovered some tiny important details after a crazy 100k token deep dive or flailing death cycle.

    Now you have to compact and you don’t know what will survive. And the built-in UI doesn’t give you good tools like deleting old messages to free up space.

    I’ll appreciate the 1M token breathing room.

    • I've found compactation kills the whole thing. Important debug steps completely missing and the AI loops back round thinking it's found a solution when we've already done that step.

      5 replies →

When running long autonomous tasks it is quite frequent to fill the context, even several times. You are out of the loop so it just happens if Claude goes a bit in circles, or it needs to iterate over CI reds, or the task was too complex. I'm hoping a long context > small context + 2 compacts.

  • Yep I have an autonomous task where it has been running for 8 hours now and counting. It compacts context all the time. I’m pretty skeptical of the quality in long sessions like this so I have to run a follow on session to critically examine everything that was done. Long context will be great for this.

  • I haven't figured out how to make use of tasks running that long yet, or maybe I just don't have a good use case for it yet. Or maybe I'm too cheap to pay for that many API calls.

    • My change cuts across multiple systems with many tests/static analysis/AI code reviews happening in CI. The agent keeps pushing new versions and waits for results until all of them come up clean, taking several iterations.

    • I mean if you don't have your company paying for it I wouldn't bother... We are talking sessions of 500-1000 dollars in cost.

  • All of those things are smells imo, you should be very weary of any code output from a task that causes that much thrashing to occur. In most cases it’s better to rewind or reset and adapt your prompt to avoid the looping (which usually means a more narrowly defined scope)

Since I'm yet to seriously dive into vibe coding or AI-assisted coding, does the IDE experience offer tracking a tally of the context size? (So you know when you're getting close or entering the "dumb zone")?

  • In Claude code I believe it's /context and it'll give you a graphical representation of what's taking context space

  • The 2 I know, Cursor and Claude Code, will give you a percentage used for the context window. So if you know the size of the window, you can deduce the number of tokens used.

  • Cline gives you such a thing. you dont really know where the dumb zone by numbers though, only by feel.

  • > Since I'm yet to seriously dive into vibe coding or AI-assisted coding

    Unless you’re using a text editor as an IDE you probably have already

It's kind of like having a 16 gallon gas tank in your car versus a 4 gallon tank. You don't need the bigger one the majority of the time, but the range anxiety that comes with the smaller one and annoyance when you DO need it is very real.

  • It seems possible, say a year or two from now that context is more like a smart human with a “small”, vs “medium” vs “large” working memory. The small fellow would be able to play some popular songs on the piano , the medium one plays in an orchestra professionally and the x-large is like Wagner composing Der Ring marathon opera. This is my current, admittedly not well informed mental model anyway. Well, at least we know we’ve got a little more time before the singularity :)

    • It’s more like the size of the desk the AI has to put sheets of paper on as a reference while it builds a Lego set. More desk area/context size = able to see more reference material = can do more steps in one go. I’ve lately been building checklists and having the LLM complete and check off a few tasks at a time, compacting in-between. With a large enough context I could just point it at a PLAN.md and tell it to go to work.

After running a context window up high, probably near 70% on opus 4.6 High and watching it take 20% bites out of my 5hr quota per prompt I've been experimenting with dumping context after completing a task. Seems to be working ok. I wonder if I was running into the long context premium. Would that apply to Pro subs or is just relevant to api pricing?

Looking at this URL, typo or YouTube flip the si tracking parameter?

  youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ

I mean, try using copilot on any substantial back-end codebase and watch it eat 90+% just building a plan/checklist. Of course copilot is constrained to 120k I believe? So having 10x that will blow open up some doors that have been closed for me in my work so far.

That said, 120k is pleeenty if you’re just building front-end components and have your API spec on hand already.