Comment by yunusabd

17 hours ago

Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.

1 comment

yunusabd

miyoji 4 hours ago

Maybe you shouldn't be relying on something if you can't even tell how good it is?