Comment by lukewarm707

2 months ago

value is high but what about the competitors?

is claude that good? the last time i tried claude it was sonnet 4.5. it was ok, not worth the api money clearly. but i only use api tokens for llms.

4 comments

lukewarm707

port11 2 months ago

If you look at SWE, Claude models aren’t that special. Other benchmarks come up with different results.

But… anecdotally, Claude is just that good. Gemini needs a lot of hand-holding, and it will still tell you it’s done when it achieved half the work. Or say, “this test isn’t passing, I’ll just delete it”. Every now and then I get tired of it and give the same task to Sonnet 4.6; 5 minutes later I’m done. Bug fixed, UI properly working, React hooks not being conditionally rendered, theme variables used properly. It’s wonderful.

I’m not sure about large agentic work or deep thinking, but I’m mostly automating away the drudgery of dealing with React Native. I still want to do the deeper work myself, but even there Opus is usually a really good sparing partner.

SergeAx 2 months ago
Were you using the Gemini model with the Claude Code harness? Otherwise, it is not an honest comparison.
- port11 2 months ago
  
  I haven’t, but I’ve used Opus in Antigravity and it performs pretty much the same? It’s hard to tell minute differences.
  Do you think Claude Code is what makes their models operate better?
  And by the same token, then what would give Gemini a fair run? Because the Gemini chat app, Stitch, and the CLI are all things I’ve used and the model can’t help itself from a) saying it’s done when it isn’t; b) going off-rails; c) ignoring strict instructions after a while.
kamma4434 2 months ago

Matches my experience. I am not sure why, but subjectively it feels better.