Comment by himata4113

11 hours ago

I don't think input/output pricing matters, 90% of the cost is cache. $0.15 is pretty good, but still very expensive.

12 comments

himata4113

It depends on the use-case. yes, 90% of cost is cache in agentic coding scenarios (actually 95% in my experience). But not when the model reasons for 200k+ tokens before answering a complex problem.

himata4113 11 hours ago
gemini models solve a problem in 80% less tokens so that's something to think about.
- johaugum 10 hours ago
  
  Source?
  
  1 reply →

simonw 10 hours ago

Gemini caching is confusing though:

  $0.15 / million tokens
  $1.00 / 1,000,000 tokens per hour (storage price)

I much prefer the OpenAI/DeepSeek way of pricing caching where you don't have to think about storage price at all - you pay for cached tokens if you reuse the same prefix within a (loosely defined) time period.

simonw 9 hours ago

As far as I can tell Gemini caching DOES work like OpenAI - see implicit caching here: https://ai.google.dev/gemini-api/docs/caching
I confirmed this by running a bunch of prompts through Gemini 3.5 Flash without doing anything special to configure caching and noting that it comes back with a "cachedContentTokenCount" on many of the responses.
The "storage price" quoted is for an optional Gemini feature that most people don't care about: https://ai.google.dev/gemini-api/docs/caching#explicit-cachi...

__jl__ 11 hours ago

In our experience, caching is not very reliable with google. We always get random cache misses that don't happen with other providers. We find OpenAI, Anthropic and Fireworks (which we use a lot) all have higher cache hit rates. So it's not only about the costs of cached token but also what kind of cached hit rate you get.

svachalek 10 hours ago
In my experience Google is the most flaky in general, which is surprising considering the rock solid history of their search and other products. Just more likely not to respond at all, to give a response out of left field, to handle the same error in 12 different ways randomly (a rainbow of HTTP status codes and error messages), etc etc.
- gwern 7 hours ago
  
  I agree. The https://aistudio.google.com/ is shockingly bad. I'm not sure I've ever used such a flaky Google service before. It's so much worse than Gmail or Google, not to mention ChatGPT or Claude or DeepSeek or Kimi or Midjourney web interfaces. The bizarre janky integration with your Google Drive, or Gemini or NBPs randomly erroring out, often indefinitely. I've had sessions refresh themselves and just... disappearing. Or when you get frustrated with a buggy dead session and hit 'new session' and have to wait minutes for 'saving...' to happen.
- veselin 9 hours ago
  
  Exactly our experience too. Effectively we catch these and on these status codes, we send to OpenAI. Retrying the same query in Gemini has high chance to give kind-of the same status code.

minimaxir 11 hours ago

10% of input pricing is standard especially compared to competition.

himata4113 11 hours ago

yah, which means that the input cost is the only value that should be paid attention to at the end + the cache discount (x10). If google would start offering x20 discount it would make it twice as cheap while input and output stayed the same.