Comment by ramesh31

7 months ago

It's pretty hilarious how I've come to trust this benchmark for a gut check on frontier models more than any of the numbers available. It seems to map perfectly to codegen abilities. Based on the pelicans, Grok 4 looks somewhere around Claude 3.7 levels.

0 comments