Comment by computerex

1 year ago

Question here is why gpt-3.5-instruct can then beat stockfish.

15 comments

computerex

PS: I ran and as suspected got-3.5-turbo-instruct does not beat stockfish, it is not even close "Final Results: gpt-3.5-turbo-instruct: Wins=0, Losses=6, Draws=0, Rating=1500.00 stockfish: Wins=6, Losses=0, Draws=0, Rating=1500.00" https://www.loom.com/share/870ea03197b3471eaf7e26e9b17e1754?...

computerex 1 year ago
Maybe there's some difference in the setup because the OP reports that the model beats stockfish (how they had it configured) every single game.
- golol 1 year ago
  
  You have to get the model to think in PGN data. It's crucial to use the exact PGN format it sae in its training data and to give it few shot examples.
- Filligree 1 year ago
  
  OP had stockfish at its weakest preset.
  
  2 replies →

lukan 1 year ago

Cheating (using a internal chess engine) would be the obvious reason to me.

nske 1 year ago

But in that case there shouldn't be any invalid moves, ever. Another tester found gpt-3.5-turbo-instruct to be suggesting at least one illegal move in 16% of the games (source: https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/ )
TZubiri 1 year ago
Nope. Calls by api don't use functions calls.
- girvo 1 year ago
  
  How can you prove this when talking about someones internal closed API?
- permo-w 1 year ago
  
  that you know of
  
  2 replies →

bluGill 1 year ago

The artical appears to have only run stockfish at low levels. you don't have to be very good to beat it

shric 1 year ago

I'm actually surprised any of them manage to make legal moves throughout the game once out of book moves.