Comment by alecco

3 months ago

For SWE it is the same ranking. But if Google's $20/mo plan is comparable to the $100-200 plans for OpenAI and Anthropic, yes they are done.

But we'll have to wait a few weeks to see if the nerfed model post-release is still as good.

3 comments

alecco

siva7 3 months ago

I have a few secret prompts to test complex reasoning capabilities of new models (in law and medicine). Gemini (2.5 pro) is by a wide margin behind Anthropic (sonnet 4.5 basic thinking) and Openai (pro model) on my own benchmark and I trust my own benchmark more than public leaderboards. So it's the other way around. Google is trying to catch up where the others are. It just doesn't seem so to some because Google undercuts prices and most people don't have own complex problems with a verified solution to test against (so they could see how bad Gemini is in reality)

alecco 3 months ago

This thread is about Gemini 3. It will be interesting to see your benchmark results when it's available later.