Comment by Closi
1 year ago
Agree with this. A few prompt variants:
* What if you allow the model to do Chain of Thought (explicitly disallowed in this experiment)
* What if you explain the board position at each step to the model in the prompt, so it doesn't have to calculate/estimate it internally.
They also tested GPT-o1, which is always CoT. Yet it is still worse.