Comment by bwfan123

17 days ago

> So what do we really learn?

We will learn if the magical capabilities attributed to these tools are really true or not. Capabilities like they can magically solve any math problem out there. This is important because AI hype is creating the narrative that these tools can solve PhD level problems and this will dis-infect that narrative. In my book, any tests that refute and dispel false narratives make a huge contribution.

1 comment

bwfan123

data_maan 15 days ago

> We will learn if the magical capabilities attributed to these tools are really true or not.

They're not. We already know that. FrontierMath. Yu Tsumura's 553th problem, RealMath benchmark. The list goes on. As I said many times on this thread, there is nothing novel in this benchmark.

This fact that this benchmark is so hyped shows that the community knows nothing, NOTHING, about prior work in this space, which makes me sad.