VibeBench: Measuring 1k Engineers' Opinions of New Models 3 days ago (vibebench.standardagents.ai) 4 comments jpschroeder Reply Add to library mhi3 3 days ago "Published benchmarks are gamed, optimized, and overfit, and no longer yield a useful signal."Is this true?But I love this concept! jpschroeder 3 days ago Oh very true. Benchmaxxing itself is basically gaming them. ramon156 2 days ago Love the idea!Page is incredibly slow on mobile, probably the avatars memoryleakgame 2 days ago 800 commits in a year...
mhi3 3 days ago "Published benchmarks are gamed, optimized, and overfit, and no longer yield a useful signal."Is this true?But I love this concept! jpschroeder 3 days ago Oh very true. Benchmaxxing itself is basically gaming them.
"Published benchmarks are gamed, optimized, and overfit, and no longer yield a useful signal."
Is this true?
But I love this concept!
Oh very true. Benchmaxxing itself is basically gaming them.
Love the idea!
Page is incredibly slow on mobile, probably the avatars
800 commits in a year...