← Back to context

Comment by rfurmani

1 day ago

Sounds like it did not:

> This year, our advanced Gemini model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions – all within the 4.5-hour competition time limit

I interpreted that bit as meaning they did not manually alter the problem statement before feeding it to the model - they gave it the exact problem text issued by IMO.

It is not clear to me from that paragraph if the model was allowed to call tools on its own or not.

  • As a side question, do you think using tools like Lean will become a staple of these "deep reasoning" LLM flavors?

    It seems that LLMs excel (relative to other paradigms) in the kind of "loose" creative thinking humans do, but are also prone to the same kinds of mistakes humans make (hallucinations, etc). Just as Lean and other formal systems can help humans find subtle errors in their own thinking, they could do the same for LLMs.

    • I was surprised to see them not using tools for it, that feels like a more reliable way to get useful results for this kind of thing.

      I get the impression not using tools is as part of the point though - to help demonstrate how much mathematical "reasoning" you can get out of just a model on its own.

      1 reply →

Yes, that quote is contained in my comment. But I don't think it unambiguously excludes tool use in the internal chain of thought.

I don't think tool use would detract from the achievement, necessarily. I'm just interested to know.

  • End to end in natural language would imply no tool use, I'd imagine. Unless it called another tool which converted it but that would be a real stretch (smoke and mirrors).