Comment by nikcub
13 hours ago
the most cited is terminal bench 2.0, but its also plagued by cheating accusations and benchmaxxing.
somewhat remarkably, claude code ranks last for Opus 4.6 - which may say something about cc, or say something about the benchmark
No comments yet
Contribute on Hacker News ↗