Comment by BoumTAC

4 hours ago

To me, mini releases matter much more and better reflect the real progress than SOTA models.

The frontier models have become so good that it's getting almost impossible to notice meaningful differences between them.

Meanwhile, when a smaller / less powerful model releases a new version, the jump in quality is often massive, to the point where we can now use them 100% of the time in many cases.

And since they're also getting dramatically cheaper, it's becoming increasingly compelling to actually run these models in real-life applications.

7 comments

BoumTAC

brikym 4 hours ago

If you're doing something common then maybe there are no differences with SOTA. But I've noticed a few. GPT 5.4 isn't as good at UI work in svelte. Gemini tends to go off and implement stuff even if I prompt it to discuss but it's pretty good at UI code. Claude tends to find out less about my code base than GPT and it abuses the any type in typescript.

patates 3 hours ago

Big part of these differences may be the system prompts and/or the harness.

zozbot234 2 hours ago

> And since they're also getting dramatically cheaper, it's becoming increasingly compelling to actually run these models in real-life applications.

They're not really cheaper than the SOTA open models on third-party inference platforms, and they're generally dumber. I suppose they're still worth it if you must minimize latency for any given level of smarts, but not really otherwise.

pzo 4 hours ago

they do are cheaper than SOTA but not getting dramatically cheaper but actually the opposite - GPT 5.4 mini is around ~3x more expensive than GPT 5.0 mini.

Similarly gemini 3.1 flash lite got more expensive than gemini 2.5 flash lite.

BoumTAC 4 hours ago
But they are getting dramatically better.
What's the point of a crazy cheap model if it's shit ?
I code most of the time with haiku 4.5 because it's so good. It's cheaper for me than buying a 23€ subscription from Anthropic.
- philipkglass 4 hours ago
  
  The crazy cheap models may be adequate for a task, and low cost matters with volume. I need to label millions of images to determine if they're sexually suggestive (this includes but is not limited to nudity). The Gemini 2.0 Flash Lite model is inexpensive and performs well. Gemini 2.5 Flash Lite is also good, but not noticeably better, and it costs more. When 2.0 gets retired this June my costs are going up.

sebastiennight 2 hours ago

> 100% of the time in many cases

So, every single time, the new model works most of the time?