Comment by bwfan123
5 days ago
Based on the past history with frontier-math & AIME 2025 [1],[2] I would not trust announcements which cant be independently verified. I am excited to try it out though.
Also, the performance of LLMs on imo 2025 was not even bronze [3].
Finally, this article shows that LLMs were just mostly bluffing [4] on usamo 2025.
[1] https://www.reddit.com/r/slatestarcodex/comments/1i53ih7/fro...
The solutions were publicly posted to GitHub: https://github.com/aw31/openai-imo-2025-proofs/tree/main
Did humans formalize the inputs ? or was the exact natural language input provided to the llm. A lot of detail is missing on the methodology used. Not to mention of any independent validation.
My skepticism stems from the past frontier math announcement which turned out to be a bluff.
People are reading a lot into the FrontierMath articles from a couple months ago, but tbh I don’t really understand what the controversy is supposed to be there. failing to clearly disclose sponsoring Epoch to make the benchmark clearly doesn’t affect performance of a model on it