Comment by Davidzheng

7 months ago

would like to see FrontierMath results. Don't have a lot of personal trust in HLE.

6 comments

Davidzheng

"Don't have a lot of personal trust in HLE."

Why?

AIPedant 7 months ago
A lot of the questions are simple subject matter knowledge, and some of them are multiple-choice. Asking LLMs multiple-choice questions is scientific malpractice: it is not interesting that statistical next-token predictors can attain superhuman performance on multiple choice tests. We've all known since children that you can go pretty far on a Scantron by using surface heuristics and a vague familiarity with the material.
I will add that, as an unfair smell test, the very name "Humanity's Last Exam" implies an arrogant contempt for scientific reasoning, and I would not be at all surprised if they were corrupt in a similar way as Frontier Math and OpenAI - maybe xAI funded HLE in exchange for peeking at the questions.
- UltraSane 7 months ago
  
  "A lot of the questions are simple subject matter knowledge" Aren't most questions incredibly hard?
  
  2 replies →
Davidzheng 7 months ago

I only know math and out of the 2 examples of math questions I think one of them is wrong. So out of this very limited data I have I don't really trust their problems. OK I'm not sure completely about my claim.