Comment by cubefox

8 hours ago

I found the proofs you were referring to:

Google https://storage.googleapis.com/deepmind-media/gemini/IMO_202...

OpenAI https://github.com/aw31/openai-imo-2025-proofs/

9 comments

cubefox

Gemini is clearer but MY GOD is it verbose. e.g. look at problem 1, section 2. Analysis of the Core Problem - there's nothing at all deep here, but it seems the model wants to spell out every single tiny logical step. I wonder if this is a stylistic choice or something that actually helps the model get to the end.

vessenes 6 hours ago
They actually do help - in that they give the model more computation time and also allow realtime management of the input context by the model. You can see this same behavior in the excessive comment writing some coding models engage in; Anthropic interviews said these do actually help the model.
- johnfn 2 hours ago
  
  Gemini did not one-shot these answers; it did its thinking elsewhere (probably not released by Google) and then it consolidated it down into what you see in the PDF. From the article:
  > We achieved this year’s result using an advanced version of Gemini Deep Think – an enhanced reasoning mode for complex problems that incorporates some of our latest research techniques, including parallel thinking. This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought.
  I don't see any parallel thinking, e.g., so that was probably elided in the final results.
  
  1 reply →
shiandow 7 hours ago
Section 2 is a case by case analysis. Those are never pretty but perfectly normal given the problem.
With OpenAI that part takes up about 2/3 if the proof even with its fragmented prose. I don't think it does much better.
- sweezyjeezy 6 hours ago
  
  It's not it being case by case that's my issue. I used do olympiads and e.g. for the k>=3 case I wouldn't write much more than:
  "Since there are 3k - 3 points on the perimeter of the triangle to be covered, and any sunny line can pass through at most two of them, it follows that 3k − 3 ≤ 2k, i.e. k ≤ 3."
  Gemini writes:
  Let Tk be the convex hull of Pk. Tk is the triangle with vertices V1 = (1, 1), V2 = (1, k), V3 = (k, 1). The edges of Tk lie on the lines x = 1 (V), y = 1 (H), and x + y = k + 1 (D). These lines are shady.
  Let Bk be the set of points in Pk lying on the boundary of Tk. Each edge contains k points. Since the vertices are distinct (as k ≥ 2), the total number of points on the boundary is |Bk| = 3k − 3.
  Suppose Pk is covered by k sunny lines Lk. These lines must cover Bk. Let L ∈ Lk. Since L is sunny, it does not coincide with the lines containing the edges of Tk. A line that does not contain an edge of a convex polygon intersects the boundary of the polygon at most at two points. Thus, |L ∩ Bk| ≤ 2. The total coverage of Bk by Lk is at most 2k. We must have |Bk| ≤ 2k. 3k − 3 ≤ 2k, which implies k ≤ 3.
  
  1 reply →

CamperBob2 4 hours ago

Kind of disappointing that neither provider shows the unsuccessful attack on problem 6.

cubefox 2 hours ago

They don't show any reasoning traces at all, just the final proofs. We must assume the traces are pretty huge, since at least Google makes it clear that they are heavily relying on inference compute:
> We achieved this year’s result using an advanced version of Gemini Deep Think – an enhanced reasoning mode for complex problems that incorporates some of our latest research techniques, including parallel thinking. This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought. [...] We will be making a version of this Deep Think model available to a set of trusted testers, including mathematicians, before rolling it out to Google AI Ultra subscribers.