Comment by ranger_danger

17 hours ago

Just FYI this article seems to define "start of the art" as "popular", as measured by "total mentions and user sentiment", without any bearing on the technical abilities or actual usage of the model.

4 comments

ranger_danger

yunusabd 16 hours ago

Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.

miyoji 2 hours ago

Maybe you shouldn't be relying on something if you can't even tell how good it is?

mellosouls 16 hours ago

That's pretty much exactly what the title says.

The technical abilities and usage are derived from the commenters usage reflections.

swyx 6 hours ago

and assuming all mentions are coding model mentions just because its on hn