Comment by avereveard

7 hours ago

the ecosystem obsession with public benchmarks comes from the fact that running benchmark costs, and labs don't test on any given private benchmark

but yeah you're correct anyone optimizing for public-bench rank instead of their own task-distribution eval has been pointing at the wrong thing for a while

still I guess useful signal to know which one model to consider, negative signal is still signal, assuming everyone is gaming benchmark in certain ways, lack of performance do result in a real workload effect

0 comments

avereveard

No comments yet

Contribute on Hacker News ↗