Comment by UncleOxidant
11 hours ago
IIRC when Gemini 3 Pro came out it was considered to be just about on par with whatever version of Claude was out then (4?). Now Gemini 3 is looking long in the tooth. Considering how many Chinese models have been released since then, and at least 2 or 3 versions of Claude, it's starting to look like Google is kind of sitting still here. Maybe you're right and they'll surprise us soon with a large step improvement over what they currently have. Note: I do realize that there's been a Gemini 3.1 release, but it didn't seem like a noticeable change from 3.
As other people are saying here: the Gemini models are mostly terrible at tool use and long context management. And maybe not quite as good with finicky "detail" parts of coding generally.
Where they excel is just total holistic _knowledge_ about the world. I don't like "talking" to it, because I kind of hate its tone, but I find Gemini generally extremely useful for research and analysis tasks and looking up information.
People who say Gemini is bad at long contexts are so wrong.
You can put whole 50,000 - 70,000 LOC codebase into Gemini 3.1 Pro context making it 800,000+ tokens, give it detailed task and ask for whole changed files back and it will execute it sometimes in one shot, sometimes in two. E.g depend on whatever stack you work with let you see all the errors at once so it can fix everything on single reply.
Yes it will give you back 5-15 files up to 4000 LOC total with only relevant parts changed.
This is terrible inefficient way to burn $10 of tokens in 20 minutes, but attention and 1:1 context retention is truly amazing.
PS: At the same time it is bad at tool use, but this have nothing to do with context.
Gemini had the best long context support for the longest time, and even now at >400k tokens it's still got the best long context recall.
Gemini is just not trained for autonomy/tool use/agentic behavior to the same degree as the other frontier models. Goog seems to emphasize video/images/scientific+world knowledge.
My experience is it advertises large context and then just becomes incoherent and confused as it climbs to fill that context.
e.g. it sucks at general tool use but sucks even more at it after a chunk of time in a session. One frustrating situation is to watch it go into a loop trying and failing to edit source files.
I often wonder how my old coworkers from Google get by, if this is the the agentic coding they have available to them for working on projects on Google3. But I suspect the models they work with have been fine tuned on Google's custom tooling and perform better?