Comment by underlines
6 hours ago
well, your own, unleaked ones, representing your real workloads.
if you can't afford to do that, look at a lot of them, eg. on artificialanalysis.com they merge multiple benchmarks across weighted categories and build an Intelligence Score, Coding Score and Agentic score.
No comments yet
Contribute on Hacker News ↗