← Back to context

Comment by ifwinterco

6 days ago

Opus 4.6 genuinely seems worse than 4.5 was in Q4 2025 for me. I know everyone always says this and anecdote != data but this is the first time I've really felt it with a new model to the point where I still reach for the old one.

I'll give GPT 5.3 codex a real try I think

Huh… I’ve seen this comment a lot in this thread but I’ve really been impressed with both Anthropic’s latest models and latest tooling (plugins like /frontend-design mean it actually designs real front ends instead of the vibe coded purple gradient look). And I see it doing more planning and making fewer mistakes than before. I have to do far less oversight and debugging broken code these days.

But if people really like Codex better, maybe I’ll try it. I’ve been trying not to pay for 2 subscriptions at once but it might be worth a test.

  • > And I see it doing more planning and making fewer mistakes than before

    Anecdotally, maybe this is the reason? It does seem to spend a lot more time “thinking” before giving what feels like equivalent results, most of the time.

    Probably eats into the gambling-style adrenaline cycles.

I asked Codex 5.3 and Opus 4.6 to write me a macos application with a certain set of requirements.

Opus 4.6 wrote me a working macos application.

Codex wrote me a html + css mockup of a macos application that didn't even look like a macos application at all.

Opus 4.5 was fine, but I feel that 4.6 is more often on the money on its implementations than 4.5 was. It is just slower.

  • I asked both to help me with a hardware bug. Codex kept trying things, being sure of what the problem is every time, and every time making it worse.

    Opus went off and browsed my dependencies for ten minutes, and came back and solved the problem firs try.

    • Heh, I find Codex to be a far, far smarter model than Claude Code.

      And there's a good reason the most "famous" vibe coders, including the OpenClaw creator all moved to Codex, it's just better.

      Claude writes a lot more code to do anything, tons of redundent code, repeated code etc. Codex is only model I've seen which occasionally removes more code than it writes.

    • Funnily enough I've been using Codex 5.3 on maximum thinking for bug hunting and code reviews and it's been really good at it (it's just seem to have a completely different focus than Opus.)

      I generally don't like the way codex approaches coding itself so I just feed its review comments back in to Claude Code and off we go.

      1 reply →

I agree with you. Codex 5.3 is good it's just a bit slower.

  • It is (slower), especially at xhigh setting. But if I have to redo things three times, keep confirming trivial stuff (Claude Code seems to keep changing the commands it uses to read code... once it uses "bash-read", once it uses "tree", once it uses "head" and I have to keep confirming permission), I definitely waste more time than give a command to codex (or in my case OpenCode + codex model) and come back after 10 minutes.