Comment by maeil
6 months ago
This isn't news, the other popular benchmarks are just as gamed and worthless, it would be shocking if this one wasn't. The other frontier model providers game them just as hard, it's not an OpenAI thing. Any benchmark that a provider themselves mentions is not worth the pixels its written on.
No comments yet
Contribute on Hacker News ↗