Comment by kamranjon

8 hours ago

It's interesting that, of the large inference providers, Google has one of the most inconvenient policies around model deprecation. They deprecate models exactly 1 year after releasing them and force you to move onto their next generation of models. I had assumed, because they are using their own silicon, that they would actually be able to offer better stability, but the opposite seems to be true. Their rate limiting is also much stricter than OpenAI for example. I wonder how much of this is related to these TPU's, vs just strange policy decisions.

7 comments

kamranjon

gordonhart 8 hours ago

It's frustrating how cavalier they are about killing old Gemini releases. My read is that once a new model is serving >90% of volume, which happens pretty quickly as most tools will just run the latest+greatest model, the standard Google cost/benefit analysis is applied and the old thing is unceremoniously switched off. It's actually surprising that they recently extended the EOL date for Gemini 2.5. Google has never been a particularly customer-obsessed company...

surajrmal 8 hours ago
What benefit is there to sticking on older models? If the API is the same, what are the switching costs?
- kamranjon 7 hours ago
  
  Consistency, new models don't behave the same on every task as their predecessors. So you end up building pipelines that rely on specific behavior, but now you find that the new model performs worse with regards to a specific task you were performing, or just behaves differently and needs prompt adjustments. They also can fundamentally change the default model settings during new releases, for example Gemini 2.5 models had completely different behavior with regards to temperature settings than previous models. It just creates a moving target that you constantly have to adjust and rework instead of providing a platform that you and by extension your users can rely on. Other providers have much longer deprecation windows, so they must at least understand this frustration.
- gordonhart 7 hours ago
  
  If you're trying to run repeatable workflows, stability from not changing the model can outweigh the benefits of a smarter new model.
  The cost can also change dramatically: on top of the higher token costs for Gemini Pro ($1.25/mtok input for 2.5 versus $2/mtok input for 3.1), the newer release also tokenizes images and PDF pages less efficiently by default (>2x token usage per image/page) so you end up paying much much more per request on the newer model.
  These are somewhat niche concerns that don't apply to most chat or agentic coding use cases, but they're very real and account for some portion of the traffic that still flows to older Gemini releases.
akelly 7 hours ago
I've heard GenAI.mil still has Gemini 2.5 only.
- gordonhart 7 hours ago
  
  Wouldn't surprise me. The best model you can get on AWS GovCloud is still Claude Sonnet 4.5.

jbellis 8 hours ago

Flash 2 isn't even at EOL until June but we started seeing ~90% error rates getting 429s over the weekend. (So we switched to GPT 5.4 nano.)