Comment by daxfohl

15 hours ago

I tried something similar, but distilled to "solve this maze" as a first-person text adventure, and while it usually solved it eventually, it almost always backtracked through fully-explored dead ends multiple times before finally getting to the end. I was pretty surprised by this, as I expected they'd be able to traverse more or less optimally most of the time.

I tried basic raw long-context chat, various approaches of getting it to externalize the state (i.e. prompting it to emit the known state of the maze after each move, but not telling it exactly what to emit or how to format it), and even allowing it to emit code to execute after each turn (so long as it was a serialization/storage algorithm, not a solver in itself), but it invariably would get lost at some point. (It always neglected to emit a key for which coordinate was which, and which direction was increasing. Even if I explicitly told it to do this, it would frequently forget to at some point anyway and get turned around again. If I explicitly provided the key each move, it would usually work).

Of course it had no problem writing an optimal algorithm to solve mazes when prompted. In fact it basically wrote itself; I have no idea how to write a maze generator. I thought the disparity was interesting.

Note the mazes had the start and end positions inside the maze itself, so they weren't trivially solvable by the "follow wall to the left" algorithm.

This was last summer so maybe newer models would do better. I also stopped due to cost.