Comment by mgambati
10 hours ago
1m context in OpenAI and Gemini is just marketing. Opus is the only model to provide real usable bug context.
10 hours ago
1m context in OpenAI and Gemini is just marketing. Opus is the only model to provide real usable bug context.
I'm directly conveying my actual experience to you. I have tasks that fill up Opus context very quickly (at the 200k context) and which took MUCH longer to fill up Codex since 5.2 (which I think had 400k context at the time).
This is direct comparison. I spent months subscribed to both of their $200/mo plans. I would try both and Opus always filled up fast while Codex continued working great. It's also direct experience that Codex continues working great post-compaction since 5.2.
I don't know about Gemini but you're just wrong about Codex. And I say this as someone who hates reporting these facts because I'd like people to stop giving OpenAI money.
I agree even though I used to be a die hard Claude fan I recently switched back to ChatGPT and codex to try it out again and they’ve clearly pulled into the lead for consistency, context length and management as well as speed. Claude Code instilled a dread in me about keeping an eye on context but I’m slowly learning to let that go with codex.
This has been my experience too.
Have any of you heard of map reduce
[flagged]
When Anthropic said they wouldn't sell LLMs to the government for mass surveillance or autonomous killing machines, and got labeled a supply chain risk as a result, OpenAI told the public they have the same policy as Anthropic while inking a deal with the government that clearly means "actually we will sell you LLMs for mass surveillance or autonomous killing machines but only if you tell us it's legal".
If you already knew all that I'm not interested in an argument, but if you didn't know any of that, you might be interested in looking it up.
edit: Your post history has tons of posts on the topic so clearly I just responded to flambait, and regret giving my time and energy.
8 replies →
Source? I ask because I use 500k+ context on these on a daily basis.
Big refactorings guided by automated tests eat context window for breakfast.
i find gemini gets real real bad when you get far into the context - gets into loops, forgets how to call tools, etc
yeah gemini is dumb when you tell it to do stuff - but the things it finds (and critically confirms, including doing tool calls while validating hypotheses) in reviews absolutely destroy both gpt and opus.
if you're a one-model shop you're losing out on quality of software you deliver, today. I predict we'll all have at least two harness+model subscriptions as a matter of course in 6-12 months since every model's jagged frontier is different at the margins, and the margins are very fractal.
I find gemini does that normally, personally. Noticeably worse in my usage than either Claude or Codex.
I find Gemini to be real bad. Are you just using it for price reasons, or?
How many big refactorings are you doing? And why?
How is that relevant? we are talking about models, now what you do with them.
Codex high reasoning has been a legitimately excellent tool for generating feedback on every plan Claude opus thinking has created for me.