← Back to context

Comment by AIPedant

3 days ago

"Simple" is unfair to the humans who discovered that knowledge, but not to the LLM. The point is that such questions are indistinguishable from niche trivia - the questions aren't actually "hard" in a cognitive sense, merely esoteric as a matter of surface feature identification + NLP. I don't know anything about hummingbird anatomy but I am not interested in hummingbirds and haven't read papers about them. Does it make sense to say such questions are "hard?" Are we talking about hardness of a trivia game, or actual cognitive ability? And it's frustrating to see these lumped into computational questions, analysis questions, etc etc. What exactly is HLE benchmarking? It is not a scientifically defensible measurement. It seems like the express purpose of the test is

a) to make observers say "wow those questions sure are hard!" without thinking carefully about what that means for an LLM versus a human

b) to let AI folks sneer that the LLM might be smarter than you because it can recite facts about category theory and you can't

(Are my cats smarter than you because they know my daily habits and you don't? The conflation of academically/economically useful knowledge with "intelligence" is one of AI's dumbest and longest-standing blunders.)