Comment by GodelNumbering

9 hours ago

Per million input/output tokens:

Gemini 2.5 flash: $0.30/$2.50

Gemini 3.0 flash preview: $0.50/$3.00

Gemini 3.5 flash: $1.50/$9.00

Interesting pricing direction. I don't think we have ever seen a 3x price increase for in the immediate next same-sized model (and lol @ 3 only ever getting a preview).

3.5 flash costs similar to Gemini 2.5 pro which was $1.25/$10

127 comments

GodelNumbering

__jl__ 8 hours ago

This understates the cost increase. 3.5 Flash also uses more tokens. artificialanalysis.ai shows these difference to run the whole eval, which I think is more realistic pricing:

Gemini 2.5 flash (27 score): $172 (1.0x)

Gemini 2.5 pro (35 score): $649 (3.8x)

Gemini 3.0 Flash (46 score): $278 (1.6x)

Gemini 3.5 Flash (55 score): $1,552 (9.0x or 2.4x compared to 2.5 pro)

This is a massive price increase... 5.6x compared to Gemini 3.0 Flash

doginasuit 9 hours ago

They probably never intended to keep serving cheap models. This is a natural way to introduce the squeeze, now that they have people who built services on their API. It makes a lot of sense to have an abstraction layer where the provider doesn't matter. If you are working in Kotlin, Koog is excellent.

lanthissa 8 hours ago
switching models is insanely cheap compared to token cost on anything signficant, this is a take so cynical it misses the reality
- Clueed 6 hours ago
  
  in any corporate or half compliance-relevant setting switching isn't trivial. new DPA, subprocessor notifications, TIA, procurement review, security questionnaires, plus re-running your evals because prompts don't transfer 1:1. token cost is just one of the line items.
  
  5 replies →
opsnooperfax 2 hours ago

I think the big 3 are cartelizing and starting to ratchet up costs. GPT5.5 is not easily distinguishable from 5.1. I would it be shocked if we hit the ceiling and everyone is quietly positioning for the exit.
hnarn 8 hours ago

> now that they have people who built services on their API
People really can’t wait to be the next Zynga

rudedogg 9 hours ago

If Google is actually getting cheaper inference than everyone else with their TPUs, this smells like trouble to me. Maybe serving LLMs at a profit is proving difficult.

Or maybe they think because their benchmarks are good they can ramp up the prices. Seems like they don’t have the market share to justify a move like that yet to me.

tempaccount420 9 hours ago
This is not priced at inference cost.
My guess: it's the price at which they make more money than if they rent the TPUs to other companies.
The Gemini team has had trouble securing enough TPUs for their user's needs. They struggle with load and their rate limits are really bad. Maybe at a higher price, they have a better chance at getting more TPUs assigned?
- gpm 8 hours ago
  
  The cost at such they could rent out the TPUs, i.e. the market rate, is the inference cost.
  Just because you are vertically integrated doesn't mean you get to discount the one business units products to the other. Doing so discounts the opportunity cost you pay and is just bad accounting.
  
  3 replies →
spyckie2 8 hours ago
Its probably that in 1 or 2 years local (free) models will completely take the place of cheap models so cheap models need to move up the quality chain.
You have free local models for most tasks, $20 subscriptions for near-frontier intelligence, and API per token costs for frontier intelligence.
Flash seems to be targeting the near-frontier category.
- TurdF3rguson 7 hours ago
  
  That might work if it wasn't for FOMO. Are you ok with only $20 of frontier usage a month?
  
  1 reply →
booty 8 hours ago
Prevailing wisdom is that serving LLMs at a profit is achievable... it's when you factor in the cost of training them that prices get astronomical real fast.
Open-source model inference providers (who do not have to bear the cost of training) seem able to do it at much lower prices.
https://www.together.ai/pricing
https://fireworks.ai/pricing#serverless-pricing (scroll down to headline models)
Of course, it's possible that they are burning through investor cash as well, and apples-to-apples comparisons are not possible because AFAIK Google does not mention the size/paramcount for 3.5 Flash.
But if the prevailing wisdom is true, I think it's actually encouraging. It suggests that OpenAI and Anthropic could perhaps, if they need to, achieve profitability if they slow down model development and focus on tooling etc. instead. If true that's probably good news for everybody w.r.t. preventing a bursting of this economic bubble.
...my opinions here are of course, conjecture built on top of conjecture....
- eklitzke 4 hours ago
  
  Most of the training cost is not in the final training run, it's in all of the R&D (including salaries, equity, etc.) that it takes to get to the final training run. The actual cost of all of the TPUs (or GPUs), power, networking, storage, etc. for the final training run is significant, but it's even more expensive to have this huge R&D team doing frontier model development and using a lot of those same resources during development.
  I think you're right that releasing models at a slower cadence would bring down costs to some degree, but it's not clear how much. All of these companies could significantly reduce their opex but at the risk of falling behind in terms of being at the frontier.
- HDBaseT 6 hours ago
  
  Not to discredit you, because you are 100% correct but tangential note about together.ai, they seem fairly unreliable with constant outages or higher than normal latency.
BoorishBears 6 hours ago

This is trouble if you're not Google/OpenAI/Anthropic: they're all shifting towards pricing for the economic value of the knowledge work they're aiding.
The economic value increases non-linearly as models get more intelligent: being 10% more capable unlocks way more than 10% in downstream value.
That's trouble because the non-linear component means at some point their margins will stop primarily defined by the cost of compute, and start being dominated by how intelligent the model is.
At that point you can expect compute prices to skyrocket and free capacity to plummet, so even if you have a model that's "good enough", you can't afford to deploy it at scale.
(and in terms of timing, I think they're all well under the curve for pricing by economic value. Everyone is talking about Uber spending millions on tokens, but how much payroll did they pay while devs scrolled their phones and waited for CC to do their job?)
IncreasePosts 9 hours ago
Maybe the margins are just very large for Google because they predict so much demand for 3.5?
- GodelNumbering 9 hours ago
  
  This combined with locally runnable models getting pretty good recently (e.g. Qwen 3.6) tells me that it's time to seriously consider local dev setup again
  
  2 replies →

hei-lima 9 hours ago

We need another "Deepseek moment" or else it will become impossible for the regular dude to use AI. It will become something that only big companies can afford.

SwellJoe 8 hours ago
We're having DeepSeek moments every couple of weeks.
Qwen 3.6 hit hard in the self-hosting space. It's incredibly capable for its size, really shaking up what's possible in 64GB or even 32GB of VRAM.
The Prism Bonsai ternary model crams a tremendous amount of capability into 1.75GB.
And, DeepSeek V4 is crazy good for the price. They're charging flash model prices for their top-tier Pro model, which is competitive with the frontier of a few months ago.
The winners in the AI war will be the companies that figure out how to run them efficiently, not the ones that eke out a couple percent better performance on a benchmark while spending ten times as much on inference (though the capability has to be there, I think we're seeing that capability alone isn't a strong moat...there's enough competent competition to insure there's always at least a few options even at the very frontier of capability).
- Zambyte 7 hours ago
  
  > It's incredibly capable for its size, really shaking up what's possible in 64GB or even 32GB of VRAM.
  You can lower that to at least 24GB. I've been running Qwen 3.5 and 3.6 with codex on a 7900 XTX and the long horizon tasks it can handle successfully has been blowing my mind. I would seriously choose running my current local setup over (the SOTA models + ecosystem) of a year ago just based on how productive I can be.
  
  1 reply →
- trollbridge 7 hours ago
  
  We have Qwen 3.6-35b (6) on a 5090 (32GB) and it's blowing me away. Works fine for most (not all) code generation tasks. One developer here has been extremely stubborn about adopting AI; he's finally adopted it, albeit only when it's coming from a local model like this.
  DeepSeek V4 Pro likewise is insanely good for the price. I simply point it at large codebases, go get a cup of coffee or browse Hacker News, and then it's done useful work. This was simply not possible with other models without hitting budget problems.
  
  5 replies →
squidbeak 9 hours ago
Deepseek had another moment a few weeks ago. V4 isn't far behind the US frontier, and so far its flash variant seems a very reliable coder and costs a pittance.
- ai_fry_ur_brain 8 hours ago
  
  Deepseek V4 (not flash) trippled in price too by the way (from Deepseek). Get used to this pattern.
  This is what you get for relying on the generosity of billionaires. Keep offshoring your thinking ability to a machine and let me know how competitive you. Hint, you wont be. There's nothing special about being able to use an LLM.
  
  13 replies →
xbmcuser 8 hours ago
What we need is a deepseek moment in hardware ie China reaching parity on node size that is the only way latest computers let alone latest ai will be available to us in the future otherwise the profit margins will push most production to AI.
- blackoil 33 minutes ago
  
  Open Source ASML EUV. But will wipe off trillions from US stocks so 401k may not like that.
- throwa356262 8 hours ago
  
  To be honest, China not having access to the latest hardware is exactly what has driven LLM technology forward the last 2 years.
  
  3 replies →
stared 5 hours ago

We have a "DeepSeek moment", https://huggingface.co/bartowski/Qwen_Qwen3.6-35B-A3B-GGUF
segmondy 9 hours ago
You can use lots of open weight models today.
- hei-lima 8 hours ago
  
  That's one solution to the problem. But it still needs some good computational capabilities. Either we optimize the hell out of those models, or we wait for the hardware to become good enough for them.
- Gigachad 6 hours ago
  
  The real problem is the hardware to run them is still very expensive.
pianopatrick 7 hours ago

Maybe we can figure out better ways to use the models that can run on cheap hardware.
GeorgeOldfield 8 hours ago
gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh
- k8sToGo 8 hours ago
  
  Are you really comparing flash to opus? Shouldn't you be comparing pro?
  
  1 reply →
- bachmeier 7 hours ago
  
  Who would have guessed that something costing roughly a third as much wouldn't do as well at certain tasks.
- kmac_ 8 hours ago
  
  Well, the first impression is that Gemini still goes off the instruction rails easier than other models, but I noticed that it tends to go back to the initial goal without holding a hand, which is a real improvement. It's really interesting that these models behave so differently.

fnordsensei 9 hours ago

3.5 flash is listed as stable rather than preview, or am I misreading?

https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flas...

GodelNumbering 9 hours ago

ah I mistakenly wrote preview

dr_dshiv 9 hours ago

3.1 flash lite — $0.25/$1.50 — plus insanely fast.

3.1 flash lite isn’t quite as good as 3 flash preview (which is the most incredible cheap model… I really love it) — but 3.1 is half the price and the insane speed opens up different use cases.

For comparison, Opus models are $5/$25

SwellJoe 8 hours ago
Opus 4.7 is smarter than even Gemini 3.1 Pro on nearly every metric, though. You're comparing apples to oranges. Gemini 3.1 Flash is somewhere in the neighborhood between current Haiku and Sonnet, I think? Still a better value than the Anthropic models, I guess, which are quite pricey.
Since Gemini 3.5 Flash is raising the price to $1.50/$9.00, it's priced between Haiku and Sonnet. If it outperforms Sonnet, it remains a good value, I guess. Though DeepSeek V4 Flash is much cheaper than all of them, and seemingly competitive.
- WarmWash 7 hours ago
  
  >Opus 4.7 is smarter than even Gemini 3.1 Pro on nearly every metric,
  Outside of coding, claude models are pretty meh. GPT and Gemini are the workhorses of science/math/finance.
  
  2 replies →

OakNinja 7 hours ago

To be fair, Gemini 3.1 flash _lite_ supports structured output (guaranteed json), it’s super fast, runs circles around 2.5 flash and costs $0.25/$1.50.

I use it _a lot_ and it’s very capable if you just plan correctly. I actually almost exclusively use 3.1 flash lite and 2.5 flash lite (even cheaper) and we have 99.5% accuracy in what we do.

That said, I think we’ll see the lite/flash models and the pro models will diverge more price wise. The pro models will become more and more expensive.

WhitneyLand 8 hours ago

Their rationale might be that it’s size and intelligence are growing relative to the market.

Fwiw it’s beating Claude Sonnet in most benchmarking (benchmaxxing?), yet they’ve priced it almost half off on a per token basis.

Question is are you going to persuade anyone with this argument?

Are there many devs at Google who legit prefer Gemini over Claude and Codex? Would love to hear about that.

SyneRyder 7 hours ago
> Are there many devs at Google who legit prefer Gemini over Claude and Codex? Would love to hear about that.
A few weeks ago, Steve Yegge claimed he'd heard that Google employees are banned from using Claude & Codex.
https://x.com/Steve_Yegge/status/2046260541912707471
A number of Googlers replied to say that was totally false, including Demis Hassabis, but they were all on the DeepMind team.
https://x.com/demishassabis/status/2043867486320222333
This person here claims they left Google because of the ban, and because the ban applied outside of Google work as well:
https://x.com/mihaimaruseac/status/2046272726881693960
- myko 2 hours ago
  
  > and because the ban applied outside of Google work as well
  I think false (or hasn't filtered to everyone lol)

dbbk 9 hours ago

I don't think they're really comparable. Seems they created the Flash-Lite tier to take the spot of the old Flash models.

GodelNumbering 9 hours ago
No, 2.5 had both flash and flash lite.
- mlmonkey 8 hours ago
  
  It is Google, after all ....

photonair 9 hours ago

In general, Gemini flash is still relatively cheaper compared to the "mini" version of the other big 2. However, I agree that newer version seem to have multiple X price increase (similar to the new ChatGPT) and we certainly need competition from the open source models to keep these guys in check with pricing.

malloryerik 2 hours ago

To me this is almost like a tone-deaf naming change.

Empty Slot (new Pro as Mythos competitor?)

Old Pro -> now Flash

Old Flash -> now Flash Lite

Old Flash Lite -> now Gemma (and not served by Google)

I say "almost" because the situation is more fluid and unstable than a normal naming change. If Apple were to do this with laptops, maybe it'd be like, Air gets better and pricier and becomes Pro-level model, Neo same way becomes Air-level model, etc. But Apple's too design oriented to do something like that. Google, well...

This change has made me decide to move to a multi-provider situation like through OpenRouter for consumer-facing LLM api in a service I'm building. I just can't trust Google to not constantly rearrange everything under our feet. Doesn't mean I won't use Gemini, but it clearly means I need to have others in the mix ready to go. In fact I used to use lots of Flash Lite, which is now Gemma territory, and I can't get that served by Google anymore and don't want to run my own hardware.

But in any case, I'd compare this "Flash" model with previous "Pro" on all metrics. It's kinda like if in clothes a Small suddenly became what was a Large, or at Starbucks a Grande became the new de facto Venti. And only for the new! drinks.

And if we think this way, it's possible that prices are actually falling?

LetsGetTechnicl 9 hours ago

Gen AI is unprofitable, especially at the insanely cheap rates they've been offering to get people in the door. So expect more increases in the future.

roadside_picnic 8 hours ago
These companies are unprofitable (as all companies at this stage and ambition should be) but I increasingly don't see any justification for the idea that it is fundamentally unprofitable.
Inference alone is certainly profitable. I'm running models at home that are comparable to performance of paid models a year or so ago for free. Even for much larger models the cost around inference serving are clearly manageable.
Training is where the costs are, but I'm increasingly convinced those too could have costs dramatically reduced if necessary. Chinese companies like Moonshot.ai are doing fantastic work training frontier models for a fraction of the cost we're seeing from Anthropic/OpenAI.
This isn't like Uber or Doordash where the economics fundamentally don't make sense (referring to the early days of these services where rates were very cheap).
It's a compelling story that "current AI is unsustainable", but it doesn't pan out in practice for a multitude of reasons (not the least of which is that we can always fall back to what models did last year for basically free).
- ReliantGuyZ 8 hours ago
  
  And if you can run those strong models at home for free, why would hosting them be a successful business for any of these providers?
  Profitable maybe, in terms of having low costs, but why pay Google or whoever when you can do it yourself for cheaper/"free"?
  
  1 reply →
- overrun11 6 hours ago
  
  Arguably nothing even has to change with training for this to be sustainable. Dario has claimed that Anthropic is profitable on a per training run basis. They aren't profitable because they choose to keep investing in increasingly large training runs.
  
  1 reply →
- LetsGetTechnicl 8 hours ago
  
  If it's profitable, why haven't they reported any profits? People like Ed Zitron have done the math and it just doesn't add up. I mean he just published this piece today: https://www.wheresyoured.at/ai-is-too-expensive/
  
  9 replies →
- booty 8 hours ago
  
  Yeah, at this point I think the worst-case scenario for OpenAI/Anthropic/etc is to slow down frontier model development and focus on tooling and services, as opposed to imploding completely and bursting the economic bubble. I hope?
GaggiX 9 hours ago
If you don't need SOTA or near SOTA there are plenty of dirt cheap models, just look at Gemma 4 31B on Openrouter.
- Gigachad 6 hours ago
  
  For all of the use cases being hyped you really do, and you actually need something much better than the SOTA models to do what we are being told can be done.
  The small models are useful for small things like summarizing text or search but not much else.
  
  1 reply →
- ai_fry_ur_brain 8 hours ago
  
  [flagged]
npn 8 hours ago
It is insanely profitable though, if you cut out r&d cost, plus the marketing and loss leaders. Don't let them gaslight you.
Even anthropic who does not own any hardware still have a big margin providing claude models.
- LetsGetTechnicl 8 hours ago
  
  Then why haven't they reported any profits using GAAP (generally accepted accounting principles)? They all use ARR which is easily gamed.
  
  3 replies →
- timmytokyo 6 hours ago
  
  Everything is insanely profitable if you ignore the costs.
  
  3 replies →
Rekindle8090 4 hours ago

[dead]

ilia-a 9 hours ago

Yeah, it is a massive jump in price, hardly a "Flash" model anymore... I wonder if they'll release flash lite or something with a bit more affordable price point.

OakNinja 7 hours ago

There’s already a flash lite tier since 2.5. Latest is 3.1 currently.

irthomasthomas 9 hours ago

And they are using this to power search answers?

CooCooCaCha 8 hours ago

I bet the API pricing helps pay for search users

llm_nerd 8 hours ago

It might be temporary pricing given that 3.5 Flash is actually superior to the existing 3.1 Pro in almost all regards, so they're in a bit of a lurch as 3.1 Pro really doesn't make sense given that 3.5 Pro has been delayed a bit.

SwellJoe 8 hours ago

That's a lot. DeepSeek v4 Flash is just over a tenth the price, and DeepSeek v4 Pro is roughly the same price (currently heavily discounted, but will be $1.74).

I mean, the benchmarks for Gemini 3.5 Flash are very strong, but at those prices it has to be. I guess the time of subsidized tokens from the big guys is slowly coming to an end.

copperx 5 hours ago

They have said AI will be priced like a utility, meaning $100-300 per month or so.

dzhiurgis 3 hours ago

I use Gemini models in Junie daily. When I need accuracy I switch to Gemini 3.1 Pro Preview (why it is still in preview?), but it burns thru credits leaving me topping up $5 every day. 3.1 Flash lite is just not accurate enough. 3 Flash is sweet spot just as Jetbrains suggests it is.

Maybe I'll look at Opus again, but it just was slower, much more expensive and worst at all - wasn't listening to you instructions.

verdverm 8 hours ago

At the same time, it is supposedly Gemini 3.1 Pro level at 3/4 the price

and far cheaper than comparable models, Gemini Pro is cheaper than Claude Sonnet (Anthropic still gets to charge a brand premium)

throwa356262 8 hours ago

Gemini 2.5 flash was the best Gemini model.

Not the most intelligent but perfect balance of cheap, fast and not-too-dumb.

m3kw9 8 hours ago

just subscribe to the plan, cheaper