Comment by sebastiennight
1 year ago
This is incorrect. If you take the most basic interpretation of an LLM at temperature 0 as predicting the most likely token, and you run it on, say, 1,000 runs of "complete this Spanish sentence with the word for 'X'", then:
- maybe ALL humans would fail the test in some way, eg. let's say everybody gets at least 10 of those wrong, and the average person gets 100 of those wrong.
- still, as long as most people correctly get each word right, your LLM would get every single response correct (because for each item in the test, 900+ people out of a thousand gave the same correct answer in the training set).
In that sense, it's totally possible for a system trained on a vast vat of average-human input to generate super-human outputs.
But still, the questions in that test are "solved" in the sense of "I can take a dictionary and answers these questions with full certainty". Beyond established knowledge LLMs are monkeys with typewriters, at best.
I’d like to see you ace even a middle-school level Spanish test with just a dictionary (sub Spanish with some other language if you happen to know Spanish).
It was a figure of speech. But there is nothing superintelligent about acing Spanish tests. Give me a Riemann hypothesis.
1 reply →