Comment by simonw

2 days ago

What gives you the impression the progress is plateauing?

I'm finding the difference just between Sonnet 4 and Sonnet 4.5 to be meaningful in terms of the complexity of tasks I'm willing to use them for.

> I'm finding the difference just between Sonnet 4 and Sonnet 4.5 to be meaningful in terms of the complexity of tasks I'm willing to use them for.

That doesn't mean "not plateauing".

It's better, certainly, but the difference between SOTA now and SOTA 6 months ago is a fraction of the difference between SOTA 6 months ago and the difference 18 months ago.

It doesn't mean that the models aren't getting better, it means that the improvement in each generation is smaller than the the improvement in the previous generation.

  • 18 months ago to 6 months ago was indeed a busy period - both multimodal image input and reasoning models were rare at the start of that time period and common by the end of it.

    Comparing a 12 month period to a 6 month period feels unfair to me though. I think we will have a much fuller picture by the end of the year - I have high expectations for the next wave of Chinese models and for Gemini 3.

    • > Comparing a 12 month period to a 6 month period feels unfair to me though.

      Okay. Let me clarify then.

      The difference between SOTA now and SOTA 6 months ago is a fraction of the difference between SOTA 6 months ago and SOTA 12 months ago.

      That still "plateauing". The performance of the models, should you take the time to chart them, is clearly asymptotic and we're in the flattening out phase now.

      I also observe that all the models are converging on roughly the same performance, which makes me think that we are approaching some maxima with the current approach.