Comment by tedsanders

1 year ago

Your issue is that the performance of these models at chess is incredibly sensitive to the prompt. If you have gpt-3.5-turbo-instruction complete a PGN transcript, then you'll see performance in the 1800 Elo range. If you ask in English or diagram the board, you'll see vastly degraded performance.

Unlike people, how you ask the question really really affects the output quality.