Comment by wbhart
6 hours ago
Lean is an interactive prover, not an automated prover. Last year a lot of human effort was required to formalise the problems in Lean before the machines could get to work. This year you get natural language input and output, and much faster.
The advantage of Lean is that the system checks the solutions, so hallucination is impossible. Of course, one still relies on the problems and solutions being translated to natural language correctly.
Some people prefer difficult to read formally checked solutions over informal but readable solutions. The two approaches are just solving different problems.
But there is another important reason to want to do this reliably in natural language: you can't use Lean for other domains (with a few limited exceptions). They want to train their RL pipelines for general intelligence and make them reliable for long horizon problems. If a tool is needed as a crutch, then it more or less demonstrates that LLMs will not be enough in any domain, and we'll have to wait for traditional AI to catch up for every domain.
Oh, I didn't realize that last year the problem formalization was a human effort; I assumed the provers themselves took the problem and created the formalization. Is this step actually harder to automate than solving the problem once it's formalized?
Anyway mainly I was curious whether using an interactive prover like Lean would have provided any advantage, or whether that is no longer really the case. My initial take would be that, yes, it should provide a huge advantage. Like in chess and go, it'd allow it to look algorithmically through a huge search space and check which approaches get it closer to resolving, where the AI is "only" responsible for determining what approaches to try.
OTOH, maybe not. Maybe the search space is so big that trying to go through it linearly is a waste of CPU. In which case, plausibly the translation to Lean offers no benefit. And now that I think about it, I could imagine that. When doing problems like these, you kind of have to figure out the overall approach end to end first, fill in any gaps in your logic, and the formalization/writing step is kind of the last thing you do. So I could see where starting on formalization from the start could end up being the wrong approach for IMO-level problems. It'd just be nice to have that confirmed.
The cool thing is that if true, it implies this is something completely different from the chess/go engines that rely on sheer computational power. Not so much of a "deep blue" moment, but more of an existential one.