Comment by dimitri-vs

14 hours ago

The big change here is:

> Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.

For Claude Code users this is huge - assuming coherence remains strong past 200k tok.

83 comments

dimitri-vs

MikeNotThePope 8 hours ago

Is it ever useful to have a context window that full? I try to keep usage under 40%, or about 80k tokens, to avoid what Dex Horthy calls the dumb zone in his research-plan-implement approach. Works well for me so far.

No vibes allowed: https://youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ

furyofantares 7 hours ago
I'd been on Codex for a while and with Codex 5.2 I:
1) No longer found the dumb zone
2) No longer feared compaction
Switching to Opus for stupid political reasons, I still have not had the dumb zone - but I'm back to disliking compaction events and so the smaller context window it has, has really hurt.
I hope they copy OpenAI's compaction magic soon, but I am also very excited to try the longer context window.
- pjerem 24 minutes ago
  
  If you use OpenCode (open source Claude Code implementation), you can configure compaction yourself : https://opencode.ai/docs/en/config/#compaction
- mgambati 7 hours ago
  
  1m context in OpenAI and Gemini is just marketing. Opus is the only model to provide real usable bug context.
  
  18 replies →
- karmasimida 4 hours ago
  
  This is true.
  When I am using codex, compaction isn’t something I fear, it feels like you save your gaming progress and move on.
  For Claude Code compaction feels disastrous, also much longer
- iknowstuff 7 hours ago
  
  Hmm I’ve felt the dumb zone on codex
  
  1 reply →
dev_l1x_be 1 hour ago

I never use these giant context windows. It is pointless. Agents are great at super focused work that is easy to re-do. Not sure what is the use case for giant context windows.
kaizenb 6 hours ago
Thanks for the video.
His fix for "the dumb zone" is the RPI Framework:
● RESEARCH. Don't code yet. Let the agent scan the files first. Docs lie. Code doesn't.
● PLAN. The agent writes a detailed step-by-step plan. You review and approve the plan, not just the output. Dex calls this avoiding "outsourcing your thinking." The plan is where intent gets compressed before execution starts.
● IMPLEMENT. Execute in a fresh context window. The meta-principle he calls Frequent Intentional Compaction: don't let the chat run long. Ask the agent to summarize state, open a new chat with that summary, keep the model in the smart zone.
- iamacyborg 1 hour ago
  
  > RESEARCH. Don't code yet. Let the agent scan the files first. Docs lie. Code doesn't.
  I find myself often running validity checks between docs and code and addressing gaps as they appear to ensure the docs don’t actually lie.
  
  1 reply →
- girvo 5 hours ago
  
  That's fascinating: that is identical to the workflow I've landed on myself.
  
  6 replies →
SkyPuncher 8 hours ago
Yes. I've recently become a convert.
For me, it's less about being able to look back -800k tokens. It's about being able to flow a conversation for a lot longer without forcing compaction. Generally, I really only need the most recent ~50k tokens, but having the old context sitting around is helpful.
- hombre_fatal 7 hours ago
  
  Also, when you hit compaction at 200k tokens, that was probably when things were just getting good. The plan was in its final stage. The context had the hard-fought nuances discovered in the final moment. Or the agent just discovered some tiny important details after a crazy 100k token deep dive or flailing death cycle.
  Now you have to compact and you don’t know what will survive. And the built-in UI doesn’t give you good tools like deleting old messages to free up space.
  I’ll appreciate the 1M token breathing room.
  
  6 replies →
ogig 8 hours ago
When running long autonomous tasks it is quite frequent to fill the context, even several times. You are out of the loop so it just happens if Claude goes a bit in circles, or it needs to iterate over CI reds, or the task was too complex. I'm hoping a long context > small context + 2 compacts.
- SequoiaHope 8 hours ago
  
  Yep I have an autonomous task where it has been running for 8 hours now and counting. It compacts context all the time. I’m pretty skeptical of the quality in long sessions like this so I have to run a follow on session to critically examine everything that was done. Long context will be great for this.
  
  1 reply →
- MikeNotThePope 8 hours ago
  
  I haven't figured out how to make use of tasks running that long yet, or maybe I just don't have a good use case for it yet. Or maybe I'm too cheap to pay for that many API calls.
  
  2 replies →
- boredtofears 8 hours ago
  
  All of those things are smells imo, you should be very weary of any code output from a task that causes that much thrashing to occur. In most cases it’s better to rewind or reset and adapt your prompt to avoid the looping (which usually means a more narrowly defined scope)
  
  8 replies →
dimitri-vs 8 hours ago
It's kind of like having a 16 gallon gas tank in your car versus a 4 gallon tank. You don't need the bigger one the majority of the time, but the range anxiety that comes with the smaller one and annoyance when you DO need it is very real.
- steve-atx-7600 8 hours ago
  
  It seems possible, say a year or two from now that context is more like a smart human with a “small”, vs “medium” vs “large” working memory. The small fellow would be able to play some popular songs on the piano , the medium one plays in an orchestra professionally and the x-large is like Wagner composing Der Ring marathon opera. This is my current, admittedly not well informed mental model anyway. Well, at least we know we’ve got a little more time before the singularity :)
  
  1 reply →
- scwoodal 8 hours ago
  
  Except after 4 gallons it might as well be pure oil, mucking everything up.
ricksunny 7 hours ago
Since I'm yet to seriously dive into vibe coding or AI-assisted coding, does the IDE experience offer tracking a tally of the context size? (So you know when you're getting close or entering the "dumb zone")?
- jfim 3 hours ago
  
  In Claude code I believe it's /context and it'll give you a graphical representation of what's taking context space
- MikeNotThePope 6 hours ago
  
  The 2 I know, Cursor and Claude Code, will give you a percentage used for the context window. So if you know the size of the window, you can deduce the number of tokens used.
- 8note 6 hours ago
  
  Cline gives you such a thing. you dont really know where the dumb zone by numbers though, only by feel.
- stevula 7 hours ago
  
  Most tools do, yes.
- quux 7 hours ago
  
  OpenCode does this. Not sure about other tools
- nujabe 7 hours ago
  
  > Since I'm yet to seriously dive into vibe coding or AI-assisted coding
  Unless you’re using a text editor as an IDE you probably have already
maskull 7 hours ago

After running a context window up high, probably near 70% on opus 4.6 High and watching it take 20% bites out of my 5hr quota per prompt I've been experimenting with dumping context after completing a task. Seems to be working ok. I wonder if I was running into the long context premium. Would that apply to Pro subs or is just relevant to api pricing?
saaaaaam 7 hours ago
That video is bizarre. Such a heavy breather.
- coldtea 5 hours ago
  
  What a weird and inconsequential thing to focus on...
  He's just fucking closely miced with compression + speaking fast and anxious/excited speaking to an audience
- indigodaddy 5 hours ago
  
  Most of that is just nervousness
Barbing 5 hours ago
Looking at this URL, typo or YouTube flip the si tracking parameter?
youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ
bushbaba 6 hours ago

Yes. I’ve used it for data analysis
twodave 7 hours ago

I mean, try using copilot on any substantial back-end codebase and watch it eat 90+% just building a plan/checklist. Of course copilot is constrained to 120k I believe? So having 10x that will blow open up some doors that have been closed for me in my work so far.
That said, 120k is pleeenty if you’re just building front-end components and have your API spec on hand already.

a_e_k 8 hours ago

I've been using the 1M window at work through our enterprise plan as I'm beginning to adopt AI in my development workflow (via Cline). It seems to have been holding up pretty well until about 700k+. Sometimes it would continue to do okay past that, sometimes it started getting a bit dumb around there.

(Note that I'm using it in more of a hands-on pair-programming mode, and not in a fully-automated vibecoding mode.)

chatmasta 8 hours ago

So a picture is worth 1,666 words?

hagen8 8 hours ago

Well, the question is what is contributing to the usage. Because as the context grows, the amount of input tokens are increasing. A model call with 800K token as input is 8 times more expensive than a model call with 100K tokens as input. Especially if we resume a conversation and caching does not hit, it would be very expensive with API pricing.

islewis 8 hours ago

The quality with the 1M window has been very poor for me, specifically for coding tasks. It constantly forgets stuff that has happened in the existing conversation. n=1, ymmv

robwwilliams 6 hours ago

Yes, especially with shifts in focus of a long conversation. But given the high error rates of Opus 4.6 the last few weeks it is possibly due to other factors. Conversational and code prodding has been essential.