← Back to context

Comment by Kim_Bruning

6 days ago

It depends on the model and the person? I have this wicked tiny benchmark that includes worlds with odd physics, told through multiple layers of unreliable narration. Older AI had trouble with these; but some of the more advanced models now ace the test in its original form. (I'm going to need a new test.)

For instance, how does your AI do on this question? https://pastebin.com/5cTXFE1J (the answer is "off")