Comment by ranger_danger
17 hours ago
Just FYI this article seems to define "start of the art" as "popular", as measured by "total mentions and user sentiment", without any bearing on the technical abilities or actual usage of the model.
17 hours ago
Just FYI this article seems to define "start of the art" as "popular", as measured by "total mentions and user sentiment", without any bearing on the technical abilities or actual usage of the model.
Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.
Maybe you shouldn't be relying on something if you can't even tell how good it is?
That's pretty much exactly what the title says.
The technical abilities and usage are derived from the commenters usage reflections.
and assuming all mentions are coding model mentions just because its on hn