Comment by gniv
5 days ago
From that thread: "The model solved P1 through P5; it did not produce a solution for P6."
It's interesting that it didn't solve the problem that was by far the hardest for humans too. China, the #1 team got only 21/42 points on it. In most other teams nobody solved it.
In the IMO, the idea is that the first day you get P1, P2 and P3, and the second day you get P4, P5 and P6. Usually, ordered by difficulty, they are P1, P4, P2, P5, P3, P6. So, usually P1 is "easy" and P6 is very hard. At least that is the intended order, but sometime reality disagree.
Edit: Fixed P4 -> P3. Thanks.
In this case P6 was unusually hard and P3 was unusually easy https://sugaku.net/content/imo-2025-problems/
Yikes. 30 years ago I would eat this stuff up and I was the lead dev on 3D engines.
Now I can't even make heads-or-tails of what P6 is even asking (^▽^)
You have P4 twice in there, latter should be 3
That's very silly. They should do the order like this:
Day 1: P1 P3 P5 (odds)
Day 2: P2 P4 P6 (evens)
Then the problem # is the difficulty.
On one hand, it's very difficult to break traditions.
On the other hand, the order P1 P4 P2 P5 P3 P6 is not always true.
Usually there is only one problem of geometry per day.
Some problems involve a brilliant trick and another analyzing many cases. You don't want too "long" problems the same day. (Sometimes there is solution that the Jury didn't see and the problem changes it of made-up-category.)
Some problems are difficult but have a nice easy/medium intermediate step that assigns some points.
There are a lot of implicit restrictions that can affect the order of the problem.
Also, sometimes the Jury miscalculate how difficult is a problem and it's easier or more difficult than expected. Or the Jury completely miss an alternative easier solution.
The only sure part is the order that they are printed in the paper.
I think from Canada team someone solved it but among all, its very few
To me, this is a tell of human-involvement in the model solution.
There is no reason why machines would do badly on exactly the problem which humans do badly as well - without humans prodding the machine towards a solution.
Also, there is no reason why machines could not produce a partial or wrong answer to problem 6 which seems like survivor bias to me. ie, that only correct solutions were cherrypicked.
There is at least one reason - it was a harder problem. Agreed that which IMO problems are hard for a human IMO participant and which are hard for an LLM are different things, but seems like they should be positively correlated at least?
IMO problems are not hard. They are merely tricky. They test primarily pattern recognition capabilities, requiring that flash of insight to find the hidden clue.
So it's no wonder that AI can solve them so well. Neural networks are great at pattern recognition.
A better test is to ask the AI to come up with good Olympiad problems. I went ahead and tried, and the results are average.
While it's zero proof, since the data used for training is human generated, you raise an interesting point: the financial stakes are so high in LLM research that we should be skeptical of all frontier results.
An internet connected machine that reasons like humans was by default considered a fraud 5 years ago; it's not unthinkable some researchers would fake it till they made it, but of course you need proof of it before making such an accusation.
> There is no reason why machines would do badly on exactly the problem which humans do badly as well
Unless the machine is trained to mimic human thought process.
Maybe it’s a hint that our current training techniques can create models comparable to the best humans in a given subject, but that’s the limit.
We've hit the limit of 'our current training techniques'? This result literally used newly developed techniques that surprised researchers at OpenAI.
Noam Brown: 'This result is brand new, using recently developed techniques. It was a surprise even to many researchers at OpenAI.'
So your thesis is that these new techniques - which just produced unexpected breakthroughs - represent some kind of ceiling? That's an impressive level of confidence about the limits of methods we apparently just invented in a field which seems to, if anything, be accelerating.
> Maybe it’s a hint that our current training techniques can create models comparable to the best humans in a given subject, but that’s the limit.
IMO is not the best humans in a given subject, college competitions are at a much higher level than high school competitions, and you have even higher level above that since college competitions are still limited to students.
Lmao.
You know IMO questions are not all equally difficult, right? They're specifically designed to vary in difficulty. The reason that problem 6 is hard for both humans and LLM is... it's hard! What a surprise.
Lol the OpenAI naysayers on this site are such conspiracy theorists.
There are many things that are hard for AI’s for the same reason they’re hard for humans. There are subtleties in complexity that make challenging things universal.
Obviously the model was trained on human data so its competencies lie in what other humans have provided input for over the years in mathematics, but that isn’t data contamination, that’s how all humans learn. This model, like the contestants, never saw the questions before.