Comment by itsmevictor

3 months ago

Noteworthily, although Gemini 3 Pro seems to have much benchmark scores than other models across the board (including compared to Claude), it's not the case for coding, where it appears to score essentially the same as the others. I wonder why that is.

So far, IMHO, Claude Code remains significantly better than Gemini CLI. We'll see whether that changes with Gemini 3.

6 comments

itsmevictor

lifthrasiir 3 months ago

Probably because many models from Anthropic would have been optimized for agentic coding in particular...

EDIT: Don't disagree that Gemini CLI has a lot of rough edges, though.

siva7 3 months ago

> I wonder why that is.

That's because coding is currently the only reliable benchmark where reasoning capabilities transfer to predict capabilities for other professions like law. Coding is the only area where they are shy to release numbers. All these exam scores are fakeable by gaming those benchmarks.

BoredPositron 3 months ago

Gemini performs better if you use it with Claude Code than with Gemini cli. It still has some odd problems with tool calling but a lot of the performance loss is the Gemini cli app itself.

decster 3 months ago

from my experience, the quality of gemini-cli isn't great, experiencing lot of stupied bug.

spwa4 3 months ago

Google is currently constantly laying off people. Everyone who really exceeds has jumped ship, and the people who remain ... are not top of the class anymore.
Not that Google didn't use to have problems shipping useful things. But it's gotten a lot worse.

Lionga 3 months ago

[flagged]