Comment by itsmevictor

6 hours ago

Noteworthily, although Gemini 3 Pro seems to have much benchmark scores than other models across the board (including compared to Claude), it's not the case for coding, where it appears to score essentially the same as the others. I wonder why that is.

So far, IMHO, Claude Code remains significantly better than Gemini CLI. We'll see whether that changes with Gemini 3.

> I wonder why that is.

That's because coding is currently the only reliable benchmark where reasoning capabilities transfer to predict capabilities for other professions like law. Coding is the only area where they are shy to release numbers. All these exam scores are fakeable by gaming those benchmarks.

Probably because many models from Anthropic would have been optimized for agentic coding in particular...

EDIT: Don't disagree that Gemini CLI has a lot of rough edges, though.

from my experience, the quality of gemini-cli isn't great, experiencing lot of stupied bug.

  • Google is currently constantly laying off people. Everyone who really exceeds has jumped ship, and the people who remain ... are not top of the class anymore.

    Not that Google didn't use to have problems shipping useful things. But it's gotten a lot worse.

Gemini performs better if you use it with Claude Code than with Gemini cli. It still has some odd problems with tool calling but a lot of the performance loss is the Gemini cli app itself.

Because benchmark are a retarded comparison and having nothing to do with reality. Its just jerk material for AI Fanboys