Comment by gniv

5 days ago

From that thread: "The model solved P1 through P5; it did not produce a solution for P6."

It's interesting that it didn't solve the problem that was by far the hardest for humans too. China, the #1 team got only 21/42 points on it. In most other teams nobody solved it.

17 comments

gniv

gus_massa 5 days ago

In the IMO, the idea is that the first day you get P1, P2 and P3, and the second day you get P4, P5 and P6. Usually, ordered by difficulty, they are P1, P4, P2, P5, P3, P6. So, usually P1 is "easy" and P6 is very hard. At least that is the intended order, but sometime reality disagree.

Edit: Fixed P4 -> P3. Thanks.

masterjack 5 days ago
In this case P6 was unusually hard and P3 was unusually easy https://sugaku.net/content/imo-2025-problems/
- qingcharles 4 days ago
  
  Yikes. 30 years ago I would eat this stuff up and I was the lead dev on 3D engines.
  Now I can't even make heads-or-tails of what P6 is even asking (＾▽＾)
thundergolfer 5 days ago

You have P4 twice in there, latter should be 3
tantalor 4 days ago
That's very silly. They should do the order like this:
Day 1: P1 P3 P5 (odds)
Day 2: P2 P4 P6 (evens)
Then the problem # is the difficulty.
- gus_massa 3 days ago
  
  On one hand, it's very difficult to break traditions.
  On the other hand, the order P1 P4 P2 P5 P3 P6 is not always true.
  Usually there is only one problem of geometry per day.
  Some problems involve a brilliant trick and another analyzing many cases. You don't want too "long" problems the same day. (Sometimes there is solution that the Jury didn't see and the problem changes it of made-up-category.)
  Some problems are difficult but have a nice easy/medium intermediate step that assigns some points.
  There are a lot of implicit restrictions that can affect the order of the problem.
  Also, sometimes the Jury miscalculate how difficult is a problem and it's easier or more difficult than expected. Or the Jury completely miss an alternative easier solution.
  The only sure part is the order that they are printed in the paper.

demirbey05 5 days ago

I think from Canada team someone solved it but among all, its very few

bwfan123 4 days ago

To me, this is a tell of human-involvement in the model solution.

There is no reason why machines would do badly on exactly the problem which humans do badly as well - without humans prodding the machine towards a solution.

Also, there is no reason why machines could not produce a partial or wrong answer to problem 6 which seems like survivor bias to me. ie, that only correct solutions were cherrypicked.

rjtobin 4 days ago
There is at least one reason - it was a harder problem. Agreed that which IMO problems are hard for a human IMO participant and which are hard for an LLM are different things, but seems like they should be positively correlated at least?
- cyberax 4 days ago
  
  IMO problems are not hard. They are merely tricky. They test primarily pattern recognition capabilities, requiring that flash of insight to find the hidden clue.
  So it's no wonder that AI can solve them so well. Neural networks are great at pattern recognition.
  A better test is to ask the AI to come up with good Olympiad problems. I went ahead and tried, and the results are average.
cornholio 4 days ago

While it's zero proof, since the data used for training is human generated, you raise an interesting point: the financial stakes are so high in LLM research that we should be skeptical of all frontier results.
An internet connected machine that reasons like humans was by default considered a fraud 5 years ago; it's not unthinkable some researchers would fake it till they made it, but of course you need proof of it before making such an accusation.
senko 4 days ago

> There is no reason why machines would do badly on exactly the problem which humans do badly as well
Unless the machine is trained to mimic human thought process.
OtherShrezzing 4 days ago
Maybe it’s a hint that our current training techniques can create models comparable to the best humans in a given subject, but that’s the limit.
- ctoth 4 days ago
  
  We've hit the limit of 'our current training techniques'? This result literally used newly developed techniques that surprised researchers at OpenAI.
  Noam Brown: 'This result is brand new, using recently developed techniques. It was a surprise even to many researchers at OpenAI.'
  So your thesis is that these new techniques - which just produced unexpected breakthroughs - represent some kind of ceiling? That's an impressive level of confidence about the limits of methods we apparently just invented in a field which seems to, if anything, be accelerating.
- Jensson 4 days ago
  
  > Maybe it’s a hint that our current training techniques can create models comparable to the best humans in a given subject, but that’s the limit.
  IMO is not the best humans in a given subject, college competitions are at a much higher level than high school competitions, and you have even higher level above that since college competitions are still limited to students.
raincole 4 days ago

Lmao.
You know IMO questions are not all equally difficult, right? They're specifically designed to vary in difficulty. The reason that problem 6 is hard for both humans and LLM is... it's hard! What a surprise.
atleastoptimal 4 days ago

Lol the OpenAI naysayers on this site are such conspiracy theorists.
There are many things that are hard for AI’s for the same reason they’re hard for humans. There are subtleties in complexity that make challenging things universal.
Obviously the model was trained on human data so its competencies lie in what other humans have provided input for over the years in mathematics, but that isn’t data contamination, that’s how all humans learn. This model, like the contestants, never saw the questions before.