Comment by thomasahle
2 days ago
It took Andrew Wiles 7 years of intense work to solve Fermat's Last Theorem.
The METR institute predicts that the length of tasks AI agents can complete doubles every 7 months.
We should expect it to take until 2033 before AI solves Clay Institute-level problems with 50% reliability.
That's exactly why the Millennium Prize Problem Bench[1] was created.
1. https://mppbench.com/
That's amazing :D
There is an ongoing effort to formalize a modern, streamlined proof of FLT in Lean, with all the needed prereqs. It's estimated that it will take approx. 5 years, but perhaps AI will lead to some meaningful speedup.
What I'm hoping to see is high volume automated formalization of the math literature, with the goal of formalizing (or finding flaws in) the entire thing.
And once we have that formalized corpus, it's all set up as training data for moving forward.
We can't really have across-the-board formalization of the math literature without getting the basics done first (including the whole undergrad curriculum) which is what the mathlib folks are working on. It will in fact be interesting to see if AI can meaningfully speed up that work (although they seem to be bottlenecked on review and merging at the moment, not new contribs per se. So a "coding" AI workflow may be a bit of a closer fit.)
If you have a sufficiently strong verifier 1/100000 reliability is already enough
Sure, but then 50% reliability just becomes a matter of whether you can make a strong enough verifier.