Comment by jdross

4 days ago

The pace of notable releases across the industry right now is unlike any time I remember since I started doing this in the early 2000's. And it feels like it's accelerating

How is this a notable release? It's strictly worse than Gemini 2.5 on coding &c, and only an iterative improvement over their own models. The only thing that struck me as particularly interesting was the native visual reasoning.

  • It's not worse on coding. SWE Bench, Aider, live bench coding all show noticeably better results.

Lots of releases but very little actual performance increases

  • Sonnet and Gemini saw fairly substantial perf increases recenly

    • Love Sonnet but 3.7 is not obviously an improvement over 3.5 in my real world usage. Gemini 2.5 pro is great, has replaced most others for me (Grok I use for things that require realtime answers)

      2 replies →

Not really. We’re definitely in the incremental improvement stage at this point. Certainly no indication that progress is “accelerating”.

  • Integration is accelerating rapidly. Even if model development froze today, we would still probably have ~5 years of adoption and integration before it started to level off.

    • You are both correct. It feels like the tech itself is kinda plateauing but it's still massively under-used. It will take a decade or more before the deployment starts slowing down.

  • ChatGPT 3 : iPhone 1

    A bunch of models later, we're about on the iPhone 4-5 now. Feels about right.

    • It's more like GPT-3 is the Manchester Baby, and we're somewhere around IBM 700 series right now. Still a long way to go to iPhone, as much as the industry likes to pretend otherwise.

      1 reply →