Comment by bawolff

18 days ago

Paper not about benchmarking or ML research is bad from the perspective of benchmarking. Not exactly a shocker.

The authors themselves literally state: "Unlike other proposed math research benchmarks (see Section 3), our question list should not be considered a benchmark in its current form"

7 comments

bawolff

data_maan 18 days ago

On the website https://1stproof.org/#about they claim: "This project represents our preliminary efforts to develop an objective and realistic methodology for assessing the capabilities of AI systems to autonomously solve research-level math questions."

Sounds to me to be a benchmark in all but a name. And they failed pretty terribly at achieving what they set out to do.

bwfan123 18 days ago
> And they failed pretty terribly at achieving what they set out to do.
Why the angst ? If the ai can autonomously solve these problems, isnt that a huge step forward for the field.
- data_maan 18 days ago
  
  It's not angst. It's intense frustration that they 1) are not doing the science correctly, and 2) that others (e.g. FrontierMath) already did everything they claim to be doing, so we won't learn anything new here, but somehow 1stproof get all the credit.
  
  4 replies →