Comment by spwa4

3 months ago

But ... what's missing from this comparison: Kimi-K2.

When ChatGPT-3 exploded, OpenAI had at least double the benchmark scores of any other model, open or closed. Gemini 3 Pro (not the model they actually serve) outperforms the best open model ... wait it does not uniformly beat the best open model anymore. Not even close.

Kimi-k2 beats Gemini 3 pro on several benchmarks. On average it scores just under 10% better then the best open model, currently Kimi-K2.

Gemini-3 pro is in fact only the best in about half the benchmarks tested there. In fact ... this could be another llama4 moment. The reason Gemini-3 pro is the best model is a very high score on a single benchmark ("Humanity's last exam"), if you take that benchmark out GPT-5.1 remains the best model available. The other big improvement is "SciCode", and if you take that out too the best open model, Kimi K2, beats Gemini 3 pro.

https://artificialanalysis.ai/models

And then, there's the pricing:

Kimi K2 on OpenRouter: $0.50 / M input tokens, $2.40 / M output tokens

Gemini 3 Pro: For contexts ≤ 200,000 tokens: US$ 2.00 per 1 M input tokens, Output tokens: US$ 12.00 per 1 M tokens For contexts > 200,000 tokens (long context tier): US$ 4.00 per 1 M input tokens , US$ 18.00 per 1 M output tokens

So Gemini 3 pro is 4 times, 400%, the price of the best open model (and just under 8 times, 800%, with long context), and 70% more expensive than GPT-5.1

The closed models in general, and Google specifically, serve Gemini 3 pro at double to triple the speed (as in tokens-per-second) of openrouter. Although even here it is not the best, that's openrouter with gpt-oss-120b.

0 comments

spwa4

No comments yet

Contribute on Hacker News ↗