Comment by hodgehog11
2 months ago
I agree that this is a sensible judgement for practical use, but my point is that the vibes likely will change, it's just a matter of when. You can't draw a trendline on a nonlinear metric especially when you have no knowledge of the inflection point. Individual benchmarks are certainly fallible, and we always need better ones, but the aggregate of all of the benchmarks together (and other theoretical metrics not based on test data) is correlating reasonably well with opinion polling and these are all improving at a consistent rate. It's just that it's unclear when these model improvements will lead to the outcomes that you're looking for. When it happens, it will appear like a massive leap in performance, but really it's just a threshold being hit.
No comments yet
Contribute on Hacker News ↗