Comment by Kim_Bruning
6 days ago
It depends on the model and the person? I have this wicked tiny benchmark that includes worlds with odd physics, told through multiple layers of unreliable narration. Older AI had trouble with these; but some of the more advanced models now ace the test in its original form. (I'm going to need a new test.)
For instance, how does your AI do on this question? https://pastebin.com/5cTXFE1J (the answer is "off")
No comments yet
Contribute on Hacker News ↗