Comment by Kim_Bruning

4 months ago

It depends on the model and the person? I have this wicked tiny benchmark that includes worlds with odd physics, told through multiple layers of unreliable narration. Older AI had trouble with these; but some of the more advanced models now ace the test in its original form. (I'm going to need a new test.)