Comment by MikeNotThePope

7 hours ago

Is it ever useful to have a context window that full? I try to keep usage under 40%, or about 80k tokens, to avoid what Dex Horthy calls the dumb zone in his research-plan-implement approach. Works well for me so far.

No vibes allowed: https://youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ

68 comments

MikeNotThePope

furyofantares 6 hours ago

I'd been on Codex for a while and with Codex 5.2 I:

1) No longer found the dumb zone

2) No longer feared compaction

Switching to Opus for stupid political reasons, I still have not had the dumb zone - but I'm back to disliking compaction events and so the smaller context window it has, has really hurt.

I hope they copy OpenAI's compaction magic soon, but I am also very excited to try the longer context window.

mgambati 5 hours ago
1m context in OpenAI and Gemini is just marketing. Opus is the only model to provide real usable bug context.
- furyofantares 4 hours ago
  
  I'm directly conveying my actual experience to you. I have tasks that fill up Opus context very quickly (at the 200k context) and which took MUCH longer to fill up Codex since 5.2 (which I think had 400k context at the time).
  This is direct comparison. I spent months subscribed to both of their $200/mo plans. I would try both and Opus always filled up fast while Codex continued working great. It's also direct experience that Codex continues working great post-compaction since 5.2.
  I don't know about Gemini but you're just wrong about Codex. And I say this as someone who hates reporting these facts because I'd like people to stop giving OpenAI money.
  
  7 replies →
- hu3 5 hours ago
  
  Source? I ask because I use 500k+ context on these on a daily basis.
  Big refactorings guided by automated tests eat context window for breakfast.
  
  5 replies →
- johnebgd 4 hours ago
  
  Codex high reasoning has been a legitimately excellent tool for generating feedback on every plan Claude opus thinking has created for me.
karmasimida 3 hours ago

This is true.
When I am using codex, compaction isn’t something I fear, it feels like you save your gaming progress and move on.
For Claude Code compaction feels disastrous, also much longer
iknowstuff 5 hours ago
Hmm I’ve felt the dumb zone on codex
- nomel 4 hours ago
  
  From what I've seen, it means whatever he's doing is very statistically significant.

kaizenb 4 hours ago

Thanks for the video.

His fix for "the dumb zone" is the RPI Framework:

● RESEARCH. Don't code yet. Let the agent scan the files first. Docs lie. Code doesn't.

● PLAN. The agent writes a detailed step-by-step plan. You review and approve the plan, not just the output. Dex calls this avoiding "outsourcing your thinking." The plan is where intent gets compressed before execution starts.

● IMPLEMENT. Execute in a fresh context window. The meta-principle he calls Frequent Intentional Compaction: don't let the chat run long. Ask the agent to summarize state, open a new chat with that summary, keep the model in the smart zone.

girvo 4 hours ago
That's fascinating: that is identical to the workflow I've landed on myself.
- hedora 3 hours ago
  
  It's also identical to what Claude Code does if you put it in plan mode (bound to <tab> key), at least in my experience.
  
  4 replies →
- cortesoft 3 hours ago
  
  It’s the style spec-kit uses: https://github.com/github/spec-kit
  Working on my first project with it… so far so good.

SkyPuncher 6 hours ago

Yes. I've recently become a convert.

For me, it's less about being able to look back -800k tokens. It's about being able to flow a conversation for a lot longer without forcing compaction. Generally, I really only need the most recent ~50k tokens, but having the old context sitting around is helpful.

hombre_fatal 6 hours ago
Also, when you hit compaction at 200k tokens, that was probably when things were just getting good. The plan was in its final stage. The context had the hard-fought nuances discovered in the final moment. Or the agent just discovered some tiny important details after a crazy 100k token deep dive or flailing death cycle.
Now you have to compact and you don’t know what will survive. And the built-in UI doesn’t give you good tools like deleting old messages to free up space.
I’ll appreciate the 1M token breathing room.
- roygbiv2 6 hours ago
  
  I've found compactation kills the whole thing. Important debug steps completely missing and the AI loops back round thinking it's found a solution when we've already done that step.
  
  5 replies →

ogig 7 hours ago

When running long autonomous tasks it is quite frequent to fill the context, even several times. You are out of the loop so it just happens if Claude goes a bit in circles, or it needs to iterate over CI reds, or the task was too complex. I'm hoping a long context > small context + 2 compacts.

SequoiaHope 6 hours ago

Yep I have an autonomous task where it has been running for 8 hours now and counting. It compacts context all the time. I’m pretty skeptical of the quality in long sessions like this so I have to run a follow on session to critically examine everything that was done. Long context will be great for this.
MikeNotThePope 6 hours ago
I haven't figured out how to make use of tasks running that long yet, or maybe I just don't have a good use case for it yet. Or maybe I'm too cheap to pay for that many API calls.
- ashdksnndck 6 hours ago
  
  My change cuts across multiple systems with many tests/static analysis/AI code reviews happening in CI. The agent keeps pushing new versions and waits for results until all of them come up clean, taking several iterations.
- tudelo 6 hours ago
  
  I mean if you don't have your company paying for it I wouldn't bother... We are talking sessions of 500-1000 dollars in cost.
boredtofears 6 hours ago
All of those things are smells imo, you should be very weary of any code output from a task that causes that much thrashing to occur. In most cases it’s better to rewind or reset and adapt your prompt to avoid the looping (which usually means a more narrowly defined scope)
- grafmax 6 hours ago
  
  A person has a supervision budget. They can supervise one agent in a hands-on way or many mostly-hands-off agents. Even though theres some thrashing assistants still get farther as a team than a single micromanaged agent. At least that’s my experience.
  
  4 replies →
- chrisweekly 6 hours ago
  
  weary (tired) -> wary (cautious)
- saaaaaam 6 hours ago
  
  Wary, not weary. Wary: cautious. Weary: tired.
  
  1 reply →

ricksunny 6 hours ago

Since I'm yet to seriously dive into vibe coding or AI-assisted coding, does the IDE experience offer tracking a tally of the context size? (So you know when you're getting close or entering the "dumb zone")?

jfim 1 hour ago

In Claude code I believe it's /context and it'll give you a graphical representation of what's taking context space
MikeNotThePope 5 hours ago

The 2 I know, Cursor and Claude Code, will give you a percentage used for the context window. So if you know the size of the window, you can deduce the number of tokens used.
8note 5 hours ago

Cline gives you such a thing. you dont really know where the dumb zone by numbers though, only by feel.
stevula 6 hours ago

Most tools do, yes.
quux 6 hours ago

OpenCode does this. Not sure about other tools
nujabe 5 hours ago

> Since I'm yet to seriously dive into vibe coding or AI-assisted coding
Unless you’re using a text editor as an IDE you probably have already

dimitri-vs 6 hours ago

It's kind of like having a 16 gallon gas tank in your car versus a 4 gallon tank. You don't need the bigger one the majority of the time, but the range anxiety that comes with the smaller one and annoyance when you DO need it is very real.

steve-atx-7600 6 hours ago
It seems possible, say a year or two from now that context is more like a smart human with a “small”, vs “medium” vs “large” working memory. The small fellow would be able to play some popular songs on the piano , the medium one plays in an orchestra professionally and the x-large is like Wagner composing Der Ring marathon opera. This is my current, admittedly not well informed mental model anyway. Well, at least we know we’ve got a little more time before the singularity :)
- twodave 5 hours ago
  
  It’s more like the size of the desk the AI has to put sheets of paper on as a reference while it builds a Lego set. More desk area/context size = able to see more reference material = can do more steps in one go. I’ve lately been building checklists and having the LLM complete and check off a few tasks at a time, compacting in-between. With a large enough context I could just point it at a PLAN.md and tell it to go to work.
scwoodal 6 hours ago

Except after 4 gallons it might as well be pure oil, mucking everything up.

maskull 6 hours ago

After running a context window up high, probably near 70% on opus 4.6 High and watching it take 20% bites out of my 5hr quota per prompt I've been experimenting with dumping context after completing a task. Seems to be working ok. I wonder if I was running into the long context premium. Would that apply to Pro subs or is just relevant to api pricing?

saaaaaam 6 hours ago

That video is bizarre. Such a heavy breather.

coldtea 3 hours ago

What a weird and inconsequential thing to focus on...
He's just fucking closely miced with compression + speaking fast and anxious/excited speaking to an audience
indigodaddy 3 hours ago

Most of that is just nervousness

Barbing 4 hours ago

Looking at this URL, typo or YouTube flip the si tracking parameter?

  youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ

bushbaba 4 hours ago

Yes. I’ve used it for data analysis

twodave 5 hours ago

I mean, try using copilot on any substantial back-end codebase and watch it eat 90+% just building a plan/checklist. Of course copilot is constrained to 120k I believe? So having 10x that will blow open up some doors that have been closed for me in my work so far.

That said, 120k is pleeenty if you’re just building front-end components and have your API spec on hand already.