I'm impressed and surprised that a relatively small model can learn so much from just the textual move records. Not even full algebraic notation 1.e2-e4 e7-e5 2.Ng1-f3 but just 1.e4 e5 2.Nf3 ... It has to figure out what o-o and o-o-o mean just from what King and Rook moves appear later. And the only way it could learn that the King cannot move through a checked square while castling is that such situations never appear in its training set.
Would love to see a similar experiment for 9x9 Go, where the model also needs to learn the concepts of connected group and its liberties.
I'd be curious to see if, in the 1-2% of cases where the linear probe fails to predict board occupancy, the LLM also predicts (or at least assigns non-trivial probability to) a corresponding illegal move. For example, if the linear probe incorrectly thinks there's a bishop on b4, does the LLM give more probability to illegal bishop moves along the corresponding diagonals than to other illegal bishop moves?
Nice experiment, even though we know that LLMs distill an internal world model representation of whatever they are trained on.
The experiment could be a little better by using a more descriptive form of notation than PGN. PGN notation's strength is the shorthand properties of it, because it is used by humans while playing the game. That is far from being a strength as LLM training data. ML algorithms, and LLMs are trained better by feeding them more descriptive and accurate data, and verbosity is not a problem at all. There is the FEN notation in which in every move the entire board is encoded.
One could easily imagine many different ways to describe a game, like encoding vertical and horizontal lines, listing what exact squares each piece is covering, what color squares, which of the pieces are able to move, and in each move generate one whole page of the board situation.
I call this spatial navigation, in which the LLM learns the ins and outs of it's training data and it is able to make more informed guesses. Chess is fun and all, but code generation has the potential to be a lot better than just writing functions. By feeding the LLM the AST representation of the code, the tree of workspace files, public items, module hierarchy alongside with the code, it could be a significant improvement.
> Nice experiment, even though we know that LLMs distill an internal world model representation of whatever they are trained on.
There are still a lot of people who deny that (for example Bender's "superintelligent octopus" supposedly wouldn't learn a world model, no matter how much text it trained on), so more evidence is always good.
> There is the FEN notation in which in every move the entire board is encoded.
The entire point of this is to not encode the board state!
>The entire point of this is to not encode the board state!
I am not sure about this. From the article "The 50M parameter model played at 1300 ELO with 99.8% of its moves being legal within one day of training."
I thought that the experiment was how well the model will perform, given that it's reward function is to predict text, rather than checkmate. Leela, Alpha0 their reward function is to win the game, checkmate or capture pieces. Also it goes without saying that Leela, Alpha0 cannot make illegal moves.
The experiment does not need to include the whole board position if that's a problem, if that's an important point of interest. It could encode more information about squares covered by each side for example. See for example this training experiment for Trackmania [1]. There are techniques that the ML algorithm will *never* figure out by itself if this information is not encoded in it's training data.
The point still stands. PGN notation certainly is not a good format if the goal (or one of the goals) of the experiment is to be a good chess player.
By feeding the LLM the AST representation of the code, the tree of workspace files, public items, module hierarchy alongside with the code, it could be a significant improvement.
Aider does this, using tree-sitter to build a “repository map”. This helps the LLM understand the overall code base and how it relates to the specific coding task at hand.
More broadly, I agree with your sentiment that there is a lot of value in considering the best ways to structure the data we share with LLMs. Especially in the context of coding.
>Aider does this, using tree-sitter to build a “repository map”. This helps the LLM understand the overall code base and how it relates to the specific coding task at hand.
Great stuff.
>More broadly, I agree with your sentiment that there is a lot of value in considering the best ways to structure the data we share with LLMs. Especially in the context of coding.
As the experiments on PHI-1 and PHI-2 from microsoft show, training data make a difference. The "textbooks is all you need" moto means better structured data, more clear data make a difference.
> The experiment could be a little better by using a more descriptive form of notation than PGN
The author seems more interested in the ability to learn chess at a decent level from such a poor input, as well as what kind of world model it might build, rather than wanting to help it to play as well as possible.
The fact that it was able to build a decent model of the board position from PGN training samples, without knowing anything about chess (or that it was even playing chess) is super impressive.
It seems simple enough to learn that, for example, "Nf3" means that an "N" is on "f3", especially since predicting well requires you to know what piece is on each square.
However, what is not so simple is to have to learn - without knowing a single thing about chess - that "Nf3" also means that:
1) One of the 8 squares that is a knights move away from f3, and had an "N" on it, now has nothing on it. There's a lot going on there!
2) If "f3" previously had a different piece on it, that piece is now gone (taken) - it should no longer also be associated with "f3"
If you take a neural network that already knows the basic rules of chess and train it on chess games, you produce a chess engine.
From the Wikipedia page on one of the strongest ever[1]: "Like Leela Zero and AlphaGo Zero, Leela Chess Zero starts with no intrinsic chess-specific knowledge other than the basic rules of the game. Leela Chess Zero then learns how to play chess by reinforcement learning from repeated self-play"
As described in the OP's blog post https://adamkarvonen.github.io/machine_learning/2024/01/03/c... - one of the incredible things here is that the standard GPT architecture, trained from scratch from PGN strings alone, can intuit the rules of the game from those examples, without any notion of the rules of chess or even that it is playing a game.
Which is not to diminish the work of the Leela team at all! But I find it fascinating that an unmodified GPT architecture can build up internal neural representations that correspond closely to board states, despite not having been designed for that task. As they say, attention may indeed be all you need.
I think that “intuit the rules” is just projecting.
More likely, the 16 million games just has most of the piece move combinations. It does not know a knight moves in an L. It knows from each square where a knight can move based on 16 million games.
Thanks for linking the actual post—it was a great read. I'm not an ML expert, but the author really made it easy to follow their experiment's method and results.
> I fine-tuned GPT-2 on a 50 / 50 mix of OpenWebText and chess games, and it learned to play chess and continued to output plausible looking text. Maybe there’s something interesting to look at there?
To me that suggests investigating whether there are aspects of human culture that can improve chess playing performance - i.e. whether just training on games produces less good results than training on games and literature.
This seems plausible to me, even beyond literature that is explictly about the game - learning go proverbs, which are often phrased as life advice is a part of learning go, and games are embedded all through our culture, with some stories really illustrating that you have to 'know when to hold em, know when to fold em, know when to walk away, know when to run'.
I’ve skimmed this, but if it is really true that it can play at 1800 ELO based purely on the moves, without seeing the board at each turn, that is insane. 1800 ELO is a strong human rating even with seeing the board. 1800 ELO essentially blindfolded is incredible
I’m curious how human like this LLM feels when you play it.
One of the challenges to making fun chess bots is to make it play like a low or mid ranked human. The problem is that a stockfish based bot knows some very strong moves, but deliberately plays bad moves so it’s about the right skill level. The problem is that these bad moves are often very obvious. For example I’ll threaten a queen capture.
Any human would see it and move their queen. The bot “blunders” and loses the queen to an obvious attack. It feels like the bot is letting you win which kills the enjoyment of playing with the bot.
I think that this approach would create very human like games.
There is a very interesting project on this exact problem called Maia, which trains an engine based on millions of human games played on Lichess, specifically targeting varying levels of skill from 1300 to 1900 Elo. I haven't played it myself, by my understanding is that it does a much better job imitating the mistakes of human players. https://maiachess.com
What I'm most interested in is what an LLM trained on something specific like this (even though chess, arguably, isn't super specific) has to say in human words about their strategies and moves, especially with some kind of higher order language.
And the reverse, can a human situation be expressed as a chessboard presented with a move?
Humans and machines find good moves in different ways.
Most humans have fast pattern matching that is quite good at finding some reasonable moves.
There are also classes of moves that all humans will spot. (You just moved your bishop, now it’s pointing at my queen)
The problem is that stockfish scores all moves with a number based on how good the move is. You have no idea if a human would agree.
For example mis-calculating a series of trades 4 moves deep is a very human mistake, but it’s scored the same as moving the bishop to a square where it can easily be taken by a pawn. They both result in you being a bishop down. A nerfed stockfish bot is equally likely to play either of those moves.
You might think that you could have a list of dumb move types that the bot might play, but there are thousands of possible obviously dumb moves. This is a problem for machine learning.
I'd call it an approach issue: LLM vs brute-force lookahead.
An LLM is predicting what comes next per it's training set. If it's trained on human games then it should play like a human; if it's trained on Stockfish games, then it should play more like Stockfish.
Stockfish, or any chess engine using brute force lookahead, is just trying to find the optimal move - not copying any style of play - and it's moves are therefore sometimes going to look very un-human. Imagine if the human player is looking 10-15 moves ahead, but Stockfish 40-50 moves ahead... what looks good 40-50 moves out might be quite different than what looks good to the human.
I mean, this seems obvious to me. How would the model predict the next move WITHOUT calculating the board state first? Yes, by memorization, but memorization hypothesis is easily rejected by comparison to training dataset in this case.
It is possible the model calculates an approximate board state, which is different from the board state but equivalent for most games, but not all games. It would be interesting to train adversarial policy to check this. From KataGo attack we know this does happen for Go AIs: Go rules have a concept of liberty, but so called pseudoliberty is easier to calculate and equivalent for most cases (but not all cases). In fact, human programmers also used pseudoliberty to optimize their engines. Adversarial attack found Go AIs also use pseudoliberty.
Surprisingly many people seem to believe LLMs cannot form any deeper world models beyond superficial relationships between words, even if figuring out a "hidden" model allows for a big leap in prediction performance – in this case, a hypothesis corresponding to chess rules happens to be give the best bang for the buck for predicting strings that have chess notation structure.
But the model could in principle just have learned a long list of rote heuristics that happen to predict notation strings well, without having made the inferential leap to a much simpler set of rules, and a learner weaker than a LLM could well have got stuck at that stage.
I wonder how well a human (or a group of humans) would fare at the same task and if they could also successfully reconstruct chess even if they had no prior knowledge of chess rules or notation.
(OTOH a GPT3+ level LLM certainly does know that chess notation is related to something called "chess", which is a "game" and has certain "rules", but to what extent is it able to actually utilize that information?)
It’s one thing to think it’s obvious, but quite another to prove it. I think this is the true value of this kind of work, is that it’s helping to decipher what these models are actually doing. Far too often we hear “NNs / LLMs are black boxes” as if that’s the end of the conversation.
> It is possible the model calculates an approximate board state
Yes - this is exactly what the probes show.
One interesting aspect is that it still learns to play when trained on blocks of move sequences starting from the MIDDLE of the game, so it seems it must be incrementally inferring the board state by what's being played rather than just by tracking the moves.
I'm impressed and surprised that a relatively small model can learn so much from just the textual move records (not even full algebraic notation 1.e2-e4 e7-e5 2.Ng1-f3 but just 1.e4 e5 2.Nf3).
Would love to see a similar experiment for 9x9 Go, where the model also needs to learn the concepts of connected group and its liberties.
The 'world model' question seems "not even understood" by those in the field who provide these answers to it -- and use terms like "concepts" (see the linked paper on sentiment where the NN has apparently discovered a sentiment "concept").
Consider the world to contain causal properties which bring about regularities in text, eg., Alice likes chocolate so Alice says, "I like chocolate". Alice's liking, ie., her capacity for preference, desire, taste, asethetic juddgement etc is the cause of "like".
Now these causal properties brings about significant regularities in text, so "like" occurring early in the paragraph comes to be extremely predictive of other text tokens occurring (eg., b-e-s-t, etc.)
No one in this debate doubts, whatsoever, that NNs contain "subnetworks" which divide the problem up into detecting these token correlations. This is trivially observable in CNNs where it is trivial to demonstrate subnetworks "activating" on, say, an eye-shape.
The issue is that when a competent language user judges someone's sentiment, or the implied sentiment the speaker of some text would have -- they are not using a model of how some subset of terms (like, etc.) comes to be predictive of others.
They're using the fact that the know the relevant causal properties (liking, preference, desire, etc.) and how these cause certain linguistic phrases. It is for this reason a competent language user can trivially detect irony ("of course I like going to the dentist!" -- here since we know how unlikely it is to desire this, we know this phrase is unlikely to express such a preference, etc.).
To say NNs, or any ML system, is sensitive to these mere correlations is not to say that these correlations are not formed by tracking the symptoms of real causes (eg., desire). Rather it is to say they do not track desire.
This seems obvious, since the mechanism to train them is just sensitive to patterns in tokens. These patterns are not their causes, and are not models of their causes. They're only predictive of them under highly constrained circumstances.
Astrological signs are predictive of birth dates, but they arent models of being born -- nor of time, or anything else.
No one here doubts whether NNs are sensitive to patterns in text caused by causal properties -- the issue is that they arent models of these properties; they are models of (some of) their effects as encoded in text.
>Astrological signs are predictive of birth dates, but they arent models of being born -- nor of time, or anything else.
Also eating ice cream and getting bitten by a shark do have some mutual predictive associations.
I think that the chess-GPT experiment can be interesting, not because the machine can predict every causal connection, but how many causal connections can it extract from the training data by itself. By putting a human in the loop, much more causal connections will be revealed but the human is lazy. Or expensive. Or expensive because he is lazy.
In addition correlation can be a hint for causation. If a human researches it further, then maybe it is a correlation and nothing substantial, but sometimes it may actually be a causative effect. So there is value in that.
About the overall sentiment, NN's world model is very different from a human world model indeed.
If you understand the cause of a regularity, you will predict it in all relevant circumstances. If you're just creating a model of its effects in one domain, you can only predict it in that domain --- with all other factors held constant.
This makes (merely) predictive models extremely fragile; as we often see.
One worry about this fragility is saftey: no one doubts that, say, city route planning from 1bn+ images is done via a "pixel-correlation (world) model" of pedestrian behaviour. The issue is that it isnt a model of pedestrian behaviour.
So it is only effective insofar as the effects of pedestrian behaviour, as captured in the images, in these environments, etc. remain constant.
If you understood pedestrians, ie., people, then you can imagine their behaviour in arbitrary environments.
Another way of putting it is: correlative models of effects arent sufficient for imagining novel circumstances. They encode only the effects of causes in those circumstances.
Whereas if you had a real world model, you can trivially simulate arbiatry circumstnaces.
The downside is that it's a supervised technique, so you need to already know what you're looking for. It would be nice to have an unsupervised tool that could list out all the things the network has learned.
World model might be a too big word here. When we talk of a world model (in the context of AI motels), we refer to its understanding of the world, at least in the context we trained it. But what I see is just a visualization of the output in a fashion similar to a chess board. A stronger evidence would be a for example a map of the next move, which will show whether it truly understood the game’s rules. If it show probability larger than zero on illegal board fields, it will show us why it sometimes makes illegal moves. And obviously, it didn’t fully understand the rules of the game.
Strictly speaking, it should be a mistake to assign a probability equal to zero to any moves, even for illegal board moves, but especially for an AI that learns by example and self-play. It never gets taught the rules, it only gets shown the games -- there's no reason that it should conclude that the probability of a rook moving diagonally is exactly zero just because it's never seen it happen in the data, and gets penalized in training every time it tries it.
But even for a human, assigning probability of exactly zero is too strong. It would forbid any possibility that you misunderstand any rules, or forgot any special cases. It's a good idea to always maintain at least a small amount of epistemic humility that you might be mistaken about the rules, so that sufficiently overwhelmingly strong evidence could convince you that a move you thought was illegal turns out to be legal.
There's got to be a probability cut-off, though. LLMs don't infinitely connect every token with every other token, some aren't connected at all, even if some association is taught, right?
code here: https://github.com/adamkarvonen/chess_llm_interpretability
I'm impressed and surprised that a relatively small model can learn so much from just the textual move records. Not even full algebraic notation 1.e2-e4 e7-e5 2.Ng1-f3 but just 1.e4 e5 2.Nf3 ... It has to figure out what o-o and o-o-o mean just from what King and Rook moves appear later. And the only way it could learn that the King cannot move through a checked square while castling is that such situations never appear in its training set.
Would love to see a similar experiment for 9x9 Go, where the model also needs to learn the concepts of connected group and its liberties.
I'd be curious to see if, in the 1-2% of cases where the linear probe fails to predict board occupancy, the LLM also predicts (or at least assigns non-trivial probability to) a corresponding illegal move. For example, if the linear probe incorrectly thinks there's a bishop on b4, does the LLM give more probability to illegal bishop moves along the corresponding diagonals than to other illegal bishop moves?
Nice experiment, even though we know that LLMs distill an internal world model representation of whatever they are trained on.
The experiment could be a little better by using a more descriptive form of notation than PGN. PGN notation's strength is the shorthand properties of it, because it is used by humans while playing the game. That is far from being a strength as LLM training data. ML algorithms, and LLMs are trained better by feeding them more descriptive and accurate data, and verbosity is not a problem at all. There is the FEN notation in which in every move the entire board is encoded.
One could easily imagine many different ways to describe a game, like encoding vertical and horizontal lines, listing what exact squares each piece is covering, what color squares, which of the pieces are able to move, and in each move generate one whole page of the board situation.
I call this spatial navigation, in which the LLM learns the ins and outs of it's training data and it is able to make more informed guesses. Chess is fun and all, but code generation has the potential to be a lot better than just writing functions. By feeding the LLM the AST representation of the code, the tree of workspace files, public items, module hierarchy alongside with the code, it could be a significant improvement.
> Nice experiment, even though we know that LLMs distill an internal world model representation of whatever they are trained on.
There are still a lot of people who deny that (for example Bender's "superintelligent octopus" supposedly wouldn't learn a world model, no matter how much text it trained on), so more evidence is always good.
> There is the FEN notation in which in every move the entire board is encoded.
The entire point of this is to not encode the board state!
>The entire point of this is to not encode the board state!
I am not sure about this. From the article "The 50M parameter model played at 1300 ELO with 99.8% of its moves being legal within one day of training."
I thought that the experiment was how well the model will perform, given that it's reward function is to predict text, rather than checkmate. Leela, Alpha0 their reward function is to win the game, checkmate or capture pieces. Also it goes without saying that Leela, Alpha0 cannot make illegal moves.
The experiment does not need to include the whole board position if that's a problem, if that's an important point of interest. It could encode more information about squares covered by each side for example. See for example this training experiment for Trackmania [1]. There are techniques that the ML algorithm will *never* figure out by itself if this information is not encoded in it's training data.
The point still stands. PGN notation certainly is not a good format if the goal (or one of the goals) of the experiment is to be a good chess player.
[1]https://www.youtube.com/watch?v=Dw3BZ6O_8LY
1 reply →
By feeding the LLM the AST representation of the code, the tree of workspace files, public items, module hierarchy alongside with the code, it could be a significant improvement.
Aider does this, using tree-sitter to build a “repository map”. This helps the LLM understand the overall code base and how it relates to the specific coding task at hand.
https://aider.chat/docs/repomap.html
More broadly, I agree with your sentiment that there is a lot of value in considering the best ways to structure the data we share with LLMs. Especially in the context of coding.
>Aider does this, using tree-sitter to build a “repository map”. This helps the LLM understand the overall code base and how it relates to the specific coding task at hand.
Great stuff.
>More broadly, I agree with your sentiment that there is a lot of value in considering the best ways to structure the data we share with LLMs. Especially in the context of coding.
As the experiments on PHI-1 and PHI-2 from microsoft show, training data make a difference. The "textbooks is all you need" moto means better structured data, more clear data make a difference.
https://arxiv.org/abs/2306.11644
> The experiment could be a little better by using a more descriptive form of notation than PGN
The author seems more interested in the ability to learn chess at a decent level from such a poor input, as well as what kind of world model it might build, rather than wanting to help it to play as well as possible.
The fact that it was able to build a decent model of the board position from PGN training samples, without knowing anything about chess (or that it was even playing chess) is super impressive.
It seems simple enough to learn that, for example, "Nf3" means that an "N" is on "f3", especially since predicting well requires you to know what piece is on each square.
However, what is not so simple is to have to learn - without knowing a single thing about chess - that "Nf3" also means that:
1) One of the 8 squares that is a knights move away from f3, and had an "N" on it, now has nothing on it. There's a lot going on there!
2) If "f3" previously had a different piece on it, that piece is now gone (taken) - it should no longer also be associated with "f3"
If you take a neural network that already knows the basic rules of chess and train it on chess games, you produce a chess engine.
From the Wikipedia page on one of the strongest ever[1]: "Like Leela Zero and AlphaGo Zero, Leela Chess Zero starts with no intrinsic chess-specific knowledge other than the basic rules of the game. Leela Chess Zero then learns how to play chess by reinforcement learning from repeated self-play"
[1]: https://en.wikipedia.org/wiki/Leela_Chess_Zero
As described in the OP's blog post https://adamkarvonen.github.io/machine_learning/2024/01/03/c... - one of the incredible things here is that the standard GPT architecture, trained from scratch from PGN strings alone, can intuit the rules of the game from those examples, without any notion of the rules of chess or even that it is playing a game.
Leela, by contrast, requires a specialized structure of iterative tree searching to generate move recommendations: https://lczero.org/dev/wiki/technical-explanation-of-leela-c...
Which is not to diminish the work of the Leela team at all! But I find it fascinating that an unmodified GPT architecture can build up internal neural representations that correspond closely to board states, despite not having been designed for that task. As they say, attention may indeed be all you need.
What's the strength of play for the GPT architecture? It's impressive that it figures out the rules, but does it play strong chess?
>> As they say, attention may indeed be all you need.
I don't think drawing general conclusions about intelligence from a board game is warranted. We didn't evolve to play chess or Go.
14 replies →
> can intuit the rules of the game from those examples,
I am pretty sure a bunch of matrix multiplications can't intuit anything.
naively, it doesn't seem very surprising that enormous amounts of self play cause the internal structure to reflect the inputs and outputs?
14 replies →
I think that “intuit the rules” is just projecting.
More likely, the 16 million games just has most of the piece move combinations. It does not know a knight moves in an L. It knows from each square where a knight can move based on 16 million games.
14 replies →
Thanks for linking the actual post—it was a great read. I'm not an ML expert, but the author really made it easy to follow their experiment's method and results.
I hope it performs better than ChatGPT: https://old.reddit.com/r/AnarchyChess/comments/10ydnbb/i_pla...
Though I will give it to ChatGPT, castling across the bishop was a genius move.
I've definitely played against some children who would respawn pieces. Human level AI clearly.
> I fine-tuned GPT-2 on a 50 / 50 mix of OpenWebText and chess games, and it learned to play chess and continued to output plausible looking text. Maybe there’s something interesting to look at there?
To me that suggests investigating whether there are aspects of human culture that can improve chess playing performance - i.e. whether just training on games produces less good results than training on games and literature.
This seems plausible to me, even beyond literature that is explictly about the game - learning go proverbs, which are often phrased as life advice is a part of learning go, and games are embedded all through our culture, with some stories really illustrating that you have to 'know when to hold em, know when to fold em, know when to walk away, know when to run'.
I’ve skimmed this, but if it is really true that it can play at 1800 ELO based purely on the moves, without seeing the board at each turn, that is insane. 1800 ELO is a strong human rating even with seeing the board. 1800 ELO essentially blindfolded is incredible
[dead]
I’m curious how human like this LLM feels when you play it.
One of the challenges to making fun chess bots is to make it play like a low or mid ranked human. The problem is that a stockfish based bot knows some very strong moves, but deliberately plays bad moves so it’s about the right skill level. The problem is that these bad moves are often very obvious. For example I’ll threaten a queen capture. Any human would see it and move their queen. The bot “blunders” and loses the queen to an obvious attack. It feels like the bot is letting you win which kills the enjoyment of playing with the bot.
I think that this approach would create very human like games.
There is a very interesting project on this exact problem called Maia, which trains an engine based on millions of human games played on Lichess, specifically targeting varying levels of skill from 1300 to 1900 Elo. I haven't played it myself, by my understanding is that it does a much better job imitating the mistakes of human players. https://maiachess.com
What I'm most interested in is what an LLM trained on something specific like this (even though chess, arguably, isn't super specific) has to say in human words about their strategies and moves, especially with some kind of higher order language.
And the reverse, can a human situation be expressed as a chessboard presented with a move?
> The problem is that a stockfish based bot knows some very strong moves, but deliberately plays bad moves so it’s about the right skill level.
What are you basing this on? To me it seems like difficulty is set by limiting search depth/time: https://github.com/lichess-org/fishnet/blob/master/src/api.r...
Based on playing bots and watching them do a series of good moves followed by a “blunder” that no human would ever make.
Isn't that more of a design issue than a bot AI issue?
Humans and machines find good moves in different ways.
Most humans have fast pattern matching that is quite good at finding some reasonable moves.
There are also classes of moves that all humans will spot. (You just moved your bishop, now it’s pointing at my queen)
The problem is that stockfish scores all moves with a number based on how good the move is. You have no idea if a human would agree.
For example mis-calculating a series of trades 4 moves deep is a very human mistake, but it’s scored the same as moving the bishop to a square where it can easily be taken by a pawn. They both result in you being a bishop down. A nerfed stockfish bot is equally likely to play either of those moves.
You might think that you could have a list of dumb move types that the bot might play, but there are thousands of possible obviously dumb moves. This is a problem for machine learning.
I'd call it an approach issue: LLM vs brute-force lookahead.
An LLM is predicting what comes next per it's training set. If it's trained on human games then it should play like a human; if it's trained on Stockfish games, then it should play more like Stockfish.
Stockfish, or any chess engine using brute force lookahead, is just trying to find the optimal move - not copying any style of play - and it's moves are therefore sometimes going to look very un-human. Imagine if the human player is looking 10-15 moves ahead, but Stockfish 40-50 moves ahead... what looks good 40-50 moves out might be quite different than what looks good to the human.
I mean, this seems obvious to me. How would the model predict the next move WITHOUT calculating the board state first? Yes, by memorization, but memorization hypothesis is easily rejected by comparison to training dataset in this case.
It is possible the model calculates an approximate board state, which is different from the board state but equivalent for most games, but not all games. It would be interesting to train adversarial policy to check this. From KataGo attack we know this does happen for Go AIs: Go rules have a concept of liberty, but so called pseudoliberty is easier to calculate and equivalent for most cases (but not all cases). In fact, human programmers also used pseudoliberty to optimize their engines. Adversarial attack found Go AIs also use pseudoliberty.
Surprisingly many people seem to believe LLMs cannot form any deeper world models beyond superficial relationships between words, even if figuring out a "hidden" model allows for a big leap in prediction performance – in this case, a hypothesis corresponding to chess rules happens to be give the best bang for the buck for predicting strings that have chess notation structure.
But the model could in principle just have learned a long list of rote heuristics that happen to predict notation strings well, without having made the inferential leap to a much simpler set of rules, and a learner weaker than a LLM could well have got stuck at that stage.
I wonder how well a human (or a group of humans) would fare at the same task and if they could also successfully reconstruct chess even if they had no prior knowledge of chess rules or notation.
(OTOH a GPT3+ level LLM certainly does know that chess notation is related to something called "chess", which is a "game" and has certain "rules", but to what extent is it able to actually utilize that information?)
It’s one thing to think it’s obvious, but quite another to prove it. I think this is the true value of this kind of work, is that it’s helping to decipher what these models are actually doing. Far too often we hear “NNs / LLMs are black boxes” as if that’s the end of the conversation.
> It is possible the model calculates an approximate board state
Yes - this is exactly what the probes show.
One interesting aspect is that it still learns to play when trained on blocks of move sequences starting from the MIDDLE of the game, so it seems it must be incrementally inferring the board state by what's being played rather than just by tracking the moves.
I'm impressed and surprised that a relatively small model can learn so much from just the textual move records (not even full algebraic notation 1.e2-e4 e7-e5 2.Ng1-f3 but just 1.e4 e5 2.Nf3).
Would love to see a similar experiment for 9x9 Go, where the model also needs to learn the concepts of connected group and its liberties.
The 'world model' question seems "not even understood" by those in the field who provide these answers to it -- and use terms like "concepts" (see the linked paper on sentiment where the NN has apparently discovered a sentiment "concept").
Consider the world to contain causal properties which bring about regularities in text, eg., Alice likes chocolate so Alice says, "I like chocolate". Alice's liking, ie., her capacity for preference, desire, taste, asethetic juddgement etc is the cause of "like".
Now these causal properties brings about significant regularities in text, so "like" occurring early in the paragraph comes to be extremely predictive of other text tokens occurring (eg., b-e-s-t, etc.)
No one in this debate doubts, whatsoever, that NNs contain "subnetworks" which divide the problem up into detecting these token correlations. This is trivially observable in CNNs where it is trivial to demonstrate subnetworks "activating" on, say, an eye-shape.
The issue is that when a competent language user judges someone's sentiment, or the implied sentiment the speaker of some text would have -- they are not using a model of how some subset of terms (like, etc.) comes to be predictive of others.
They're using the fact that the know the relevant causal properties (liking, preference, desire, etc.) and how these cause certain linguistic phrases. It is for this reason a competent language user can trivially detect irony ("of course I like going to the dentist!" -- here since we know how unlikely it is to desire this, we know this phrase is unlikely to express such a preference, etc.).
To say NNs, or any ML system, is sensitive to these mere correlations is not to say that these correlations are not formed by tracking the symptoms of real causes (eg., desire). Rather it is to say they do not track desire.
This seems obvious, since the mechanism to train them is just sensitive to patterns in tokens. These patterns are not their causes, and are not models of their causes. They're only predictive of them under highly constrained circumstances.
Astrological signs are predictive of birth dates, but they arent models of being born -- nor of time, or anything else.
No one here doubts whether NNs are sensitive to patterns in text caused by causal properties -- the issue is that they arent models of these properties; they are models of (some of) their effects as encoded in text.
To be fair, the term "world model" does not presume scientific understanding, factfullness or causality.
In an ideal AI model this would be the aim though.
Then it isnt a model of the world.
If the term, "effect model" were used there would be zero debate. Of course NNs model the effects of sentiment.
The debate is that AI hype artists don't merely claim to model effects in constrained domains.
>Astrological signs are predictive of birth dates, but they arent models of being born -- nor of time, or anything else.
Also eating ice cream and getting bitten by a shark do have some mutual predictive associations.
I think that the chess-GPT experiment can be interesting, not because the machine can predict every causal connection, but how many causal connections can it extract from the training data by itself. By putting a human in the loop, much more causal connections will be revealed but the human is lazy. Or expensive. Or expensive because he is lazy.
In addition correlation can be a hint for causation. If a human researches it further, then maybe it is a correlation and nothing substantial, but sometimes it may actually be a causative effect. So there is value in that.
About the overall sentiment, NN's world model is very different from a human world model indeed.
I'm curious as to what practical difference you think this distinction makes? (not being sarcastic, I just don't see it)
If you understand the cause of a regularity, you will predict it in all relevant circumstances. If you're just creating a model of its effects in one domain, you can only predict it in that domain --- with all other factors held constant.
This makes (merely) predictive models extremely fragile; as we often see.
One worry about this fragility is saftey: no one doubts that, say, city route planning from 1bn+ images is done via a "pixel-correlation (world) model" of pedestrian behaviour. The issue is that it isnt a model of pedestrian behaviour.
So it is only effective insofar as the effects of pedestrian behaviour, as captured in the images, in these environments, etc. remain constant.
If you understood pedestrians, ie., people, then you can imagine their behaviour in arbitrary environments.
Another way of putting it is: correlative models of effects arent sufficient for imagining novel circumstances. They encode only the effects of causes in those circumstances.
Whereas if you had a real world model, you can trivially simulate arbiatry circumstnaces.
3 replies →
Is a linear probe part of observability/interpretability?
Yes, a pretty fundamental technique and one of the earliest. It lets you determine which layers contain what information among other things.
The downside is that it's a supervised technique, so you need to already know what you're looking for. It would be nice to have an unsupervised tool that could list out all the things the network has learned.
2 replies →
Would love to see a similar experiment for 9x9 Go, where the model also needs to learn the concepts of connected group and its liberties.
[dead]
World model might be a too big word here. When we talk of a world model (in the context of AI motels), we refer to its understanding of the world, at least in the context we trained it. But what I see is just a visualization of the output in a fashion similar to a chess board. A stronger evidence would be a for example a map of the next move, which will show whether it truly understood the game’s rules. If it show probability larger than zero on illegal board fields, it will show us why it sometimes makes illegal moves. And obviously, it didn’t fully understand the rules of the game.
> probability larger than zero
Strictly speaking, it should be a mistake to assign a probability equal to zero to any moves, even for illegal board moves, but especially for an AI that learns by example and self-play. It never gets taught the rules, it only gets shown the games -- there's no reason that it should conclude that the probability of a rook moving diagonally is exactly zero just because it's never seen it happen in the data, and gets penalized in training every time it tries it.
But even for a human, assigning probability of exactly zero is too strong. It would forbid any possibility that you misunderstand any rules, or forgot any special cases. It's a good idea to always maintain at least a small amount of epistemic humility that you might be mistaken about the rules, so that sufficiently overwhelmingly strong evidence could convince you that a move you thought was illegal turns out to be legal.
The rules of chess are small and well known. For example, rooks can't go diagonal no matter the situation. There's no need for epistemic humility.
5 replies →
There's got to be a probability cut-off, though. LLMs don't infinitely connect every token with every other token, some aren't connected at all, even if some association is taught, right?
4 replies →
No, it is not a visualization of the output, it is a visualization of the information about pawn position contained in the model’s internal state.