Comment by teaearlgraycold

6 days ago

Personally I've found these bigger models (o3/Claude 4 Opus) to be disappointing for coding.

Opus is really great but through Claude Code. If you used Cursor or RooCode it could be normal that you get disappointed

  • This matches my experience, but cant explain it. Do you know what's going on?

    • My understanding is context size. Companies like Cursor are trying to minimize the amount of context sent to the models to keep their own costs down. Claude Code seems to send a lot more context with every request and that seems to make the difference.

    • Just guessing, but the new Opus was probably RL tuned to work better with Claude Code's tool calls

  • I got the opposite experience. Not with Opus (too expensive), but with Sonnet. I got things done way more efficiently when using Sonnet with Roo than with Claude Code.

    • same. i ran a few tests ($100 worth of api calls) with opus 4 and didn’t see any difference compared to sonnet 4 other than the price.

      also no idea why he thinks roo is handicapped when claude code nerfs the thinking output and requires typing “think”/think hard/think harder/ultrathink just to expand the max thinking tokens.. which on ultrathink only sets it at 32k… when the max in roo is 51200 and it’s just a setting.

      2 replies →

i found them all disappointing in their own ways. Atleast deepseek models actually listen to what i say instead of ignoring me doing their own thing like a toddler.