Comment by sjducb

2 years ago

I’m curious how human like this LLM feels when you play it.

One of the challenges to making fun chess bots is to make it play like a low or mid ranked human. The problem is that a stockfish based bot knows some very strong moves, but deliberately plays bad moves so it’s about the right skill level. The problem is that these bad moves are often very obvious. For example I’ll threaten a queen capture. Any human would see it and move their queen. The bot “blunders” and loses the queen to an obvious attack. It feels like the bot is letting you win which kills the enjoyment of playing with the bot.

I think that this approach would create very human like games.

There is a very interesting project on this exact problem called Maia, which trains an engine based on millions of human games played on Lichess, specifically targeting varying levels of skill from 1300 to 1900 Elo. I haven't played it myself, by my understanding is that it does a much better job imitating the mistakes of human players. https://maiachess.com

What I'm most interested in is what an LLM trained on something specific like this (even though chess, arguably, isn't super specific) has to say in human words about their strategies and moves, especially with some kind of higher order language.

And the reverse, can a human situation be expressed as a chessboard presented with a move?

Isn't that more of a design issue than a bot AI issue?

  • Humans and machines find good moves in different ways.

    Most humans have fast pattern matching that is quite good at finding some reasonable moves.

    There are also classes of moves that all humans will spot. (You just moved your bishop, now it’s pointing at my queen)

    The problem is that stockfish scores all moves with a number based on how good the move is. You have no idea if a human would agree.

    For example mis-calculating a series of trades 4 moves deep is a very human mistake, but it’s scored the same as moving the bishop to a square where it can easily be taken by a pawn. They both result in you being a bishop down. A nerfed stockfish bot is equally likely to play either of those moves.

    You might think that you could have a list of dumb move types that the bot might play, but there are thousands of possible obviously dumb moves. This is a problem for machine learning.

  • I'd call it an approach issue: LLM vs brute-force lookahead.

    An LLM is predicting what comes next per it's training set. If it's trained on human games then it should play like a human; if it's trained on Stockfish games, then it should play more like Stockfish.

    Stockfish, or any chess engine using brute force lookahead, is just trying to find the optimal move - not copying any style of play - and it's moves are therefore sometimes going to look very un-human. Imagine if the human player is looking 10-15 moves ahead, but Stockfish 40-50 moves ahead... what looks good 40-50 moves out might be quite different than what looks good to the human.