Comment by asar

13 hours ago

$1.5/m input tokens $9/m output tokens

6x the price of 3.1 flash lite

18 comments

asar

"Flash-Lite" is a different product from "Flash", which is more expensive. They couldn't be more confusing with their naming though, especially since they have 3.1 Pro and not 3.1 Flash non-lite.

WarmWash 12 hours ago

I haven't used 3.5 at all yet, but previous Gemini (and Gemma models) are by far the most token light per task than any other model.

Cost per task is a more productive measure, but obviously a more difficult one to benchmark.

iwhalen 13 hours ago

I wonder why they didn't discuss price in the post?

Compare to the GPT-5.5 announcement: https://openai.com/index/introducing-gpt-5-5/

himata4113 13 hours ago

I don't think input/output pricing matters, 90% of the cost is cache. $0.15 is pretty good, but still very expensive.

wolttam 13 hours ago
It depends on the use-case. yes, 90% of cost is cache in agentic coding scenarios (actually 95% in my experience). But not when the model reasons for 200k+ tokens before answering a complex problem.
- himata4113 12 hours ago
  
  gemini models solve a problem in 80% less tokens so that's something to think about.
  
  2 replies →
simonw 12 hours ago
Gemini caching is confusing though:
$0.15 / million tokens $1.00 / 1,000,000 tokens per hour (storage price)
I much prefer the OpenAI/DeepSeek way of pricing caching where you don't have to think about storage price at all - you pay for cached tokens if you reuse the same prefix within a (loosely defined) time period.
- simonw 11 hours ago
  
  As far as I can tell Gemini caching DOES work like OpenAI - see implicit caching here: https://ai.google.dev/gemini-api/docs/caching
  I confirmed this by running a bunch of prompts through Gemini 3.5 Flash without doing anything special to configure caching and noting that it comes back with a "cachedContentTokenCount" on many of the responses.
  The "storage price" quoted is for an optional Gemini feature that most people don't care about: https://ai.google.dev/gemini-api/docs/caching#explicit-cachi...
__jl__ 13 hours ago
In our experience, caching is not very reliable with google. We always get random cache misses that don't happen with other providers. We find OpenAI, Anthropic and Fireworks (which we use a lot) all have higher cache hit rates. So it's not only about the costs of cached token but also what kind of cached hit rate you get.
- svachalek 12 hours ago
  
  In my experience Google is the most flaky in general, which is surprising considering the rock solid history of their search and other products. Just more likely not to respond at all, to give a response out of left field, to handle the same error in 12 different ways randomly (a rainbow of HTTP status codes and error messages), etc etc.
  
  2 replies →
minimaxir 13 hours ago
10% of input pricing is standard especially compared to competition.
- himata4113 13 hours ago
  
  yah, which means that the input cost is the only value that should be paid attention to at the end + the cache discount (x10). If google would start offering x20 discount it would make it twice as cheap while input and output stayed the same.

John7878781 13 hours ago

[deleted]

stri8ed 13 hours ago

Output cost is 3x from Gemini 3 flash.