Comment by ramesh31
2 days ago
It's pretty hilarious how I've come to trust this benchmark for a gut check on frontier models more than any of the numbers available. It seems to map perfectly to codegen abilities. Based on the pelicans, Grok 4 looks somewhere around Claude 3.7 levels.
No comments yet
Contribute on Hacker News ↗