Comment by stingraycharles

5 days ago

My issue with all these citations is that it’s all OpenAI employees that make these claims.

I’ll wait to see third party verification and/or use it myself before judging. There’s a lot of incentives right now to hype things up for OpenAI.

11 comments

stingraycharles

A third party tried this experiment with publicly available models. OpenAI did half as well as Gemini, and none of the models even got bronze.

jsnell 5 days ago
I feel you're misunderstanding something. That's not "this exact experiment". Matharena is testing publicly available models against the IMO problem set. OpenAI was announcing the results of a new, unpublished model, on that problems set.
It is totally fair to discount OpenAI's statement until we have way more details about their setup, and maybe even until there is some level of public access to the model. But you're doing something very different: implying that their results are fraudulent and (incorrectly) using the Matharena results as your proof.
- jononor 4 days ago
  
  If OpenAI would publish the models before the competition, then one could verify that they were not tinkered with. Assuming that there exists a way for them to prove that a model is the same, at least. Since the weights are not open, the most basic approach is void.
- do_not_redeem 5 days ago
  
  Fair enough, edited.
- gettingoverit 4 days ago
  
  Implying results are fraudulent is completely fair when it is a fraud.
  The previous time they had claims about solving all of the math right there and right then, they were caught owning the company that makes that independent test, and could neither admit nor deny training on closed test set.
  
  5 replies →
CamperBob2 4 days ago

They didn't try o3-pro, which (while slow) is far in front of the competition right now.