← Back to context

Comment by noodletheworld

14 hours ago

> What's the metric?

Language model capability at generating text output.

The model progress this year has been a lot of:

- “We added multimodal”

- “We added a lot of non AI tooling” (ie agents)

- “We put more compute into inference” (ie thinking mode)

So yes, there is still rapid progress, but these ^ make it clear, at least to me, that next gen models are significantly harder to build.

Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings.

Thats usually a signal that the rate of progress is slowing.

Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?

Do you even remember the releases? Yeah. I dont. I had to look it up.

Just another model with more or less the same capabilities.

“Mixed reception”

That is not what exponential progress looks like, by any measure.

The progress this year has been in the tooling around the models, smaller faster models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.

That may still be on a path to AGI, but it not an exponential path to it.

I don’t think the path was ever exponential but your claim here is almost as if the slow down hit an asymptote like wall.

Most of the improvements are intangible. Can we truly say how much more reliable the models are? We barely have quantitative measurements on this so it’s all vibes and feels. We don’t even have a baseline metric for what AGI is and we invalidated the Turing test also based on vibes and feels.

So my argument is that part of the slow down is in itself an hallucination because the improvement is not actually measurable or definable outside of vibes.

> Language model capability at generating text output.

That's not a metric, that's a vague non-operationalized concept, that could be operationalized into an infinite number of different metrics. And an improvement that was linear in one of those possible metrics would be exponential in another one (well, actually, one that is was linear in one would also be linear in an infinite number of others, as well as being exponential in an infinite number of others.

That’s why you have to define an actual metric, not simply describe a vague concept of a kind of capacity of interest, before you can meaningfully discuss whether improvement is exponential. Because the answer is necessarily entirely dependent on the specific construction of the metric.

> Language model capability at generating text output.

That's not a quantifiable sentence. Unless you put it in numbers, anyone can argue exponential/not.

> next gen models are significantly harder to build.

That's not how we judge capability progress though.

> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?

> Do you even remember the releases?

At gpt 3 level we could generate some reasonable code blocks / tiny features. (An example shown around at the time was "explain what this function does" for a "fib(n)") At gpt 4, we could build features and tiny apps. At gpt 5, you can often one-shot build whole apps from a vague description. The difference between them is massive for coding capabilities. Sorry, but if you can't remember that massive change... why are you making claims about the progress in capabilities?

> Multimodal add ons that no one asked for

Not only does multimodal input training improve the model overall, it's useful for (for example) feeding back screenshots during development.