← Back to context

Comment by bwfan123

5 days ago

Based on the past history with frontier-math & AIME 2025 [1],[2] I would not trust announcements which cant be independently verified. I am excited to try it out though.

Also, the performance of LLMs on imo 2025 was not even bronze [3].

Finally, this article shows that LLMs were just mostly bluffing [4] on usamo 2025.

[1] https://www.reddit.com/r/slatestarcodex/comments/1i53ih7/fro...

[2] https://x.com/DimitrisPapail/status/1888325914603516214

[3] https://matharena.ai/imo/

[4] https://arxiv.org/pdf/2503.21934

The solutions were publicly posted to GitHub: https://github.com/aw31/openai-imo-2025-proofs/tree/main

  • Did humans formalize the inputs ? or was the exact natural language input provided to the llm. A lot of detail is missing on the methodology used. Not to mention of any independent validation.

    My skepticism stems from the past frontier math announcement which turned out to be a bluff.

    • People are reading a lot into the FrontierMath articles from a couple months ago, but tbh I don’t really understand what the controversy is supposed to be there. failing to clearly disclose sponsoring Epoch to make the benchmark clearly doesn’t affect performance of a model on it