Comment by data_maan

15 days ago

> We will learn if the magical capabilities attributed to these tools are really true or not.

They're not. We already know that. FrontierMath. Yu Tsumura's 553th problem, RealMath benchmark. The list goes on. As I said many times on this thread, there is nothing novel in this benchmark.

This fact that this benchmark is so hyped shows that the community knows nothing, NOTHING, about prior work in this space, which makes me sad.