Comment by thomasahle

1 month ago

It took Andrew Wiles 7 years of intense work to solve Fermat's Last Theorem.

The METR institute predicts that the length of tasks AI agents can complete doubles every 7 months.

We should expect it to take until 2033 before AI solves Clay Institute-level problems with 50% reliability.

7 comments

thomasahle

kelseyfrog 1 month ago

That's exactly why the Millennium Prize Problem Bench[1] was created.

1. https://mppbench.com/

thomasahle 1 month ago

That's amazing :D

zozbot234 1 month ago

There is an ongoing effort to formalize a modern, streamlined proof of FLT in Lean, with all the needed prereqs. It's estimated that it will take approx. 5 years, but perhaps AI will lead to some meaningful speedup.

pfdietz 1 month ago
What I'm hoping to see is high volume automated formalization of the math literature, with the goal of formalizing (or finding flaws in) the entire thing.
And once we have that formalized corpus, it's all set up as training data for moving forward.
- zozbot234 1 month ago
  
  We can't really have across-the-board formalization of the math literature without getting the basics done first (including the whole undergrad curriculum) which is what the mathlib folks are working on. It will in fact be interesting to see if AI can meaningfully speed up that work (although they seem to be bottlenecked on review and merging at the moment, not new contribs per se. So a "coding" AI workflow may be a bit of a closer fit.)

Davidzheng 1 month ago

If you have a sufficiently strong verifier 1/100000 reliability is already enough

thomasahle 1 month ago

Sure, but then 50% reliability just becomes a matter of whether you can make a strong enough verifier.