Comment by bawolff
18 days ago
Paper not about benchmarking or ML research is bad from the perspective of benchmarking. Not exactly a shocker.
The authors themselves literally state: "Unlike other proposed math research benchmarks (see Section 3), our question list should not be considered a benchmark in its current form"
On the website https://1stproof.org/#about they claim: "This project represents our preliminary efforts to develop an objective and realistic methodology for assessing the capabilities of AI systems to autonomously solve research-level math questions."
Sounds to me to be a benchmark in all but a name. And they failed pretty terribly at achieving what they set out to do.
> And they failed pretty terribly at achieving what they set out to do.
Why the angst ? If the ai can autonomously solve these problems, isnt that a huge step forward for the field.
It's not angst. It's intense frustration that they 1) are not doing the science correctly, and 2) that others (e.g. FrontierMath) already did everything they claim to be doing, so we won't learn anything new here, but somehow 1stproof get all the credit.
4 replies →