PS: I ran and as suspected got-3.5-turbo-instruct does not beat stockfish, it is not even close
"Final Results: gpt-3.5-turbo-instruct: Wins=0, Losses=6, Draws=0, Rating=1500.00 stockfish: Wins=6, Losses=0, Draws=0, Rating=1500.00"
https://www.loom.com/share/870ea03197b3471eaf7e26e9b17e1754?...
But in that case there shouldn't be any invalid moves, ever. Another tester found gpt-3.5-turbo-instruct to be suggesting at least one illegal move in 16% of the games (source: https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/ )
PS: I ran and as suspected got-3.5-turbo-instruct does not beat stockfish, it is not even close "Final Results: gpt-3.5-turbo-instruct: Wins=0, Losses=6, Draws=0, Rating=1500.00 stockfish: Wins=6, Losses=0, Draws=0, Rating=1500.00" https://www.loom.com/share/870ea03197b3471eaf7e26e9b17e1754?...
Maybe there's some difference in the setup because the OP reports that the model beats stockfish (how they had it configured) every single game.
You have to get the model to think in PGN data. It's crucial to use the exact PGN format it sae in its training data and to give it few shot examples.
OP had stockfish at its weakest preset.
2 replies →
Cheating (using a internal chess engine) would be the obvious reason to me.
But in that case there shouldn't be any invalid moves, ever. Another tester found gpt-3.5-turbo-instruct to be suggesting at least one illegal move in 16% of the games (source: https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/ )
Nope. Calls by api don't use functions calls.
How can you prove this when talking about someones internal closed API?
that you know of
2 replies →
The artical appears to have only run stockfish at low levels. you don't have to be very good to beat it
I'm actually surprised any of them manage to make legal moves throughout the game once out of book moves.