Comment by SirensOfTitan
2 days ago
I have a weakly held conviction (because it is based on my personal qualitative opinion) that Google aggressively and quietly quantizes (or reduces compute/thinking on) their models a little while after release.
Gemini 2.5 Pro 3-25 benchmark was by far my favorite model this year, and I noticed an extreme drop off of quality responses around the beginning of May when they pointed that benchmark to a newer version (I didn't even know they did this until I started searching for why the model degraded so much).
I noticed a similar effect with Gemini 3.0: it felt fantastic over the first couple weeks of use, and now the responses I get from it are noticeably more mediocre.
I'm under the impression all of the flagship AI shops do these kinds of quiet changes after a release to save on costs (Anthropic seems like the most honest player in my experience), and Google does it more aggressively than either OpenAI or Anthropic.
This is a common trope here the last couple of years. I really can't tell if the models get worse or its in our heads. I don't use a new model until a few months after release and I still have this experience. So they can't be degrading the models uniformly over time, it would have to be a per-user kind of thing. Possible, but then I should see a difference when I switch to my less-used (wife's) google/openAI accounts, which I don't.
It's the fate of people relying on cloud services, including the complete removal of old LLM versions.
If you want stability you go local.
Which models do you use locally?
I can definitely confirm this from my experience.
Gemini 3 feels even worse than GPT-4o right now. I dont understand the hype or why OpenAI would need a red alert because of it?
Both Opus 4.5 and GPT-5.2 are much more pleasant to use.