← Back to context

Comment by benbencodes

10 hours ago

Pricing is now live on ai.google.dev/pricing:

Gemini 3.5 Flash: $0.75 input / $4.50 output per 1M tokens, 1M context window. The output price explicitly "includes thinking tokens" — which is why it's higher than a typical flash-class model.

For comparison within the Gemini lineup: - Gemini 2.5 Flash: $0.30 / $2.50 - Gemini 3.1 Flash-Lite: $0.25 / $1.50 - Gemini 3.1 Pro Preview: $2.00 / $12.00

So 3.5 Flash is ~2.5x more expensive input vs 2.5 Flash. The pricing and "including thinking tokens" framing position it as a reasoning-capable flash model rather than just a pure speed optimization.

You’re quoting the batch pricing. On demand is 1.5 per input and 9 per M output. This is effectively comparable cost to Gemini 2.5 Pro in a flash tier model

I think you have your pricing wrong there, Gemini 3.5 flash is $1.50 input and $9 output.

  • Okay, it's kind of somewhere between haiku and sonnet level pricing, at somewhere between sonnet and opus level performance. Its a great option. I was hoping to see opus class intelligence at haiku level pricing out of google, and this is close to that!

    • Never mind, after looking at more benchmarks, seems closer to sonnet level intelligence at slightly lower cost. Speed is great for latency sensitive applications, but if this was 1/2 the cost it would have been priced to win.

      If this is the big model release out of google, its a disappointent.

You are seeing batch inference, standard inference is $1.5/$9. I was excited until I saw that price.

Standard pricing is showing for me as $1.50 / $9.

(I suspect you're viewing the "flex" pricing).

In addition to people pointing out your LLM got the pricing wrong,

> The pricing and "including thinking tokens" framing position it as a reasoning-capable flash model rather than just a pure speed optimization

Every Gemini model starting with 2.5 has been a reasoning model.