← Back to context

Comment by simonw

3 hours ago

My experience is that coding agents as-of November (GPT-5.2/Opus 4.5) produce high quality, production-worthy code against both small and large projects.

I base this on my own experience with them plus conversations with many other peers who I respect.

You can argue that OpenAI Codex using Electron disproves this if you like. I think it demonstrates a team making the safer choice in a highly competitive race against Anthropic and Google.

If you're wondering why we aren't seeing seismic results from these new tools yet, I'll point out that November was just over 2 months ago and we had the December holiday period in the middle of that.

I'm not sure I buy the safer choice argument. How much of a risk is it to assign a team of "agents" to independently work on porting the code natively? If they fail, it costs a trivial amount of compute relative to OAI's resources. If they succeed, what a PR coup that would be! It seems like they would have nothing to lose by at least trying, but they either did not try, or they did and it failed, neither of which inspires confidence in their supposedly life-changing, world-changing product.

I will note that you specifically said the agents have shown huge success over "the past 12 months", so it feels like the goalposts are growing legs when you say "actually, only for the last two months with Opus 4.5" now.

  • Claude Code was released in February, it just had its 1 year birthday a few days ago.

    OpenAI Codex CLI and Gemini CLI followed a few months afterwards

    It took a little while for the right set of coding agent features to be developed and for the models to get good enough to use those features effectively.

    I think this stuff went from interesting to useful around Sonnet 4, and from useful to "let it write most of my code" with the upgrades in November.