Comment by benbencodes

10 hours ago

Pricing is now live on ai.google.dev/pricing:

Gemini 3.5 Flash: $0.75 input / $4.50 output per 1M tokens, 1M context window. The output price explicitly "includes thinking tokens" — which is why it's higher than a typical flash-class model.

For comparison within the Gemini lineup: - Gemini 2.5 Flash: $0.30 / $2.50 - Gemini 3.1 Flash-Lite: $0.25 / $1.50 - Gemini 3.1 Pro Preview: $2.00 / $12.00

So 3.5 Flash is ~2.5x more expensive input vs 2.5 Flash. The pricing and "including thinking tokens" framing position it as a reasoning-capable flash model rather than just a pure speed optimization.

8 comments

benbencodes

lyjackal 10 hours ago

You’re quoting the batch pricing. On demand is 1.5 per input and 9 per M output. This is effectively comparable cost to Gemini 2.5 Pro in a flash tier model

conorh 10 hours ago

I think you have your pricing wrong there, Gemini 3.5 flash is $1.50 input and $9 output.

mchusma 10 hours ago
Okay, it's kind of somewhere between haiku and sonnet level pricing, at somewhere between sonnet and opus level performance. Its a great option. I was hoping to see opus class intelligence at haiku level pricing out of google, and this is close to that!
- mchusma 10 hours ago
  
  Never mind, after looking at more benchmarks, seems closer to sonnet level intelligence at slightly lower cost. Speed is great for latency sensitive applications, but if this was 1/2 the cost it would have been priced to win.
  If this is the big model release out of google, its a disappointent.

ls_stats 10 hours ago

You are seeing batch inference, standard inference is $1.5/$9. I was excited until I saw that price.

jpau 10 hours ago

Standard pricing is showing for me as $1.50 / $9.

(I suspect you're viewing the "flex" pricing).

Tiberium 10 hours ago

Please delete/edit your AI-written and factually wrong post.

MallocVoidstar 9 hours ago

In addition to people pointing out your LLM got the pricing wrong,

> The pricing and "including thinking tokens" framing position it as a reasoning-capable flash model rather than just a pure speed optimization

Every Gemini model starting with 2.5 has been a reasoning model.