Comment by saithound
6 months ago
Yes, it looks all but certain that OpenAI gamed this particular benchmark.
Otherwise, they would not have had a contract that prohibited revealing that OpenAI was involved with the project until after the o3 announcements were made and the market had time to react. There is no reason to have such a specific agreement unless you plan to use the backdoor access to beat the benchmark: otherwise, OpenAI would not have known in advance that o3 will perform well! In fact, if there was proper blinding in place (which Epoch heads confirmed was not the case), there would have been no reason for secrecy at all.
Google, xAI and Anthropic's test-time compute experiments were really underwhelming: if OpenAI has secret access to benchmarks, that explains why their performance is so different.
No comments yet
Contribute on Hacker News ↗