Comment by vl

2 months ago

Honestly, I don’t understand universal praise for Opus 4.5. It’s good, but really not better than other agents.

Just today:

Opus 4.5 Extended Thinking designed psql schema for “stream updates after snapshot” with bugs.

Grok Heavy gave correct solution without explanations.

ChatGPT 5.2 Pro gave correct solution and also explained why simpler way wouldn’t work.

5 comments

Are you using Claude Code? Because that might be the secret cause you're missing. With Claude Code I can instruct it to validate things after its done with code, and usually it finds that it goofed. I can also tell it to work on like five different things, and go "hey spin up some agents to work on this" and it will spawn 5 agents in parallel to work on said things.

I've basically ditched Groke et al and I refuse to give Sam Altman a penny.

vl 2 months ago
For schema design phase I used web UI for all three.
Logical bug of using BIGSERIAL for tracking updates (generated at insert time, not commit time, so can be out of order) wouldn’t be caught by any number of iterations of Claude Code and would be found in production after weeks of debugging.
- simonw 2 months ago
  
  At this point having any LLM write code without giving it an environment that allows it to execute that code itself is like rolling a heavily-biased random number generator and hoping you get a useful result.
  Things get so much more interesting when they're able to execute the code they are writing to see if it actually works.
  
  2 replies →