Comment by zahlman

23 days ago

> I gave 7 frontier LLMs a simple task: pilot a drone through a 3D voxel world and find 3 creatures.

> Only one could do it.

If I understood the chart correctly, even the successful one only found 1/6 of the creatures across multiple runs.

3 comments

zahlman

No science detected.

Without comparison to some null hypothesis (a random policy), this article is hogwash.