Comment by btown

2 years ago

As described in the OP's blog post https://adamkarvonen.github.io/machine_learning/2024/01/03/c... - one of the incredible things here is that the standard GPT architecture, trained from scratch from PGN strings alone, can intuit the rules of the game from those examples, without any notion of the rules of chess or even that it is playing a game.

Leela, by contrast, requires a specialized structure of iterative tree searching to generate move recommendations: https://lczero.org/dev/wiki/technical-explanation-of-leela-c...

Which is not to diminish the work of the Leela team at all! But I find it fascinating that an unmodified GPT architecture can build up internal neural representations that correspond closely to board states, despite not having been designed for that task. As they say, attention may indeed be all you need.

What's the strength of play for the GPT architecture? It's impressive that it figures out the rules, but does it play strong chess?

>> As they say, attention may indeed be all you need.

I don't think drawing general conclusions about intelligence from a board game is warranted. We didn't evolve to play chess or Go.

  • > What's the strength of play for the GPT architecture?

    Pretty shit for a computer. He says his 50m model reached 1800 Elo (by the way, its Elo and not ELO as the article incorrectly has it, it is named after a Hungarian guy called Elo). It seems to be a bit better than Stockfish level 1 and a bit worse than Stockfish level 2 from the bar graph.

    Based on what we know I think its not surprising these models can learn to play chess, but they get absolutely smoked by a "real" chess bot like Stockfish or Leela.

    • Afaik his small bot reaches 1300 and gpt-3.5-instruct reaches 1800. We have no idea how much and on what kind of PGNs the Openai model was trained. I heard a rumor that they specifically trained on games up to 1800 before but no idea.

    • They also say “I left one training for a few more days and it reached 1500 ELO.” I find it quite likely the observed performance is largely limited by the spent compute.

  • I can't see it being superhuman, that's for sure. Chess AI are superhuman because they do vast searches, and I can't see that being replicated by an LLM architecture.

  • > What's the strength of play for the GPT architecture? It's impressive that it figures out the rules, but does it play strong chess?

    sometimes it is not a matter of "is it better? is it larger? is it more efficient?", but just a question.

    mountains are mountains, men are men.

> can intuit the rules of the game from those examples,

I am pretty sure a bunch of matrix multiplications can't intuit anything.

naively, it doesn't seem very surprising that enormous amounts of self play cause the internal structure to reflect the inputs and outputs?

  • It's not self-play. It's literally just reading sequences of moves. And it doesn't even know that they're moves, or that it's supposed to be learning a game. It's just learning to predict the next token given a sequence of previous tokens.

    What's kind of amazing is that, in doing so, it actually learns to play chess! That is, the model weights naturally organize into something resembling an understanding of chess, just by trying to minimize error on next-token prediction.

    It makes sense, but it's still kind of astonishing that it actually works.

  • > I am pretty sure a bunch of matrix multiplications can't intuit anything.

    I don't understand how people can say things like this when universal approximation is an easy thing to prove. You could reproduce Magnus Carlsen's exact chess-playing stochastic process with a bunch of matrix multiplications and nonlinear activations, up to arbitrarily small error.

    • I read such statements as being claims that "intuition" is part of consciousness etc.

      It's still too strong a claim given that matrix multiplication also describes quantum mechanics and by extension chemistry and by extension biology and by extension our own brains… but I frequently encounter examples of mistaking two related concepts for synonyms, and I assume in this case it is meant to be a weaker claim about LLMs not being conscious.

      Me, I think the word "intuition" is fine, just like I'd say that a tree falling in a forest with no one to hear it does produce a sound because sound is the vibration of the air instead of the qualia.

      3 replies →

    • This simply isn't true. There are big caveats to the idea that neural networks are universal function approximators (as there are to the idea that they're universal Turing machines, which also somehow became common knowledge in our post-ChatGPT world). The function has to be continuous, we're talking about functions rather than algorithms, an approximator being possible and us knowing how to construct it are very different things, and so on.

      1 reply →

  • We really need a list of verbs we're allowed to use when talking about computers and verbs that belong in the magic human/animal-only section

    • You are probably joking, but I think it's actually very important to look at the language we use around LLMs, in order not to get stuck in assumptions and sociological bias associated with a vocabulary usually reserved for "magical" beings, as it were.

      This goes both ways by the way. I could be convinced that LLMs can achieve something the likes of intuition, but I strongly believe that it is a very different kind of intuition than we normally associate with humans/animals. Usins the same label is thus potentially confusing, and (human pride aside) might even prevent us from appreciating the full scope of what LLMs are capable of.

      1 reply →

  • > naively, it doesn't seem very surprising that enormous amounts of self play cause the internal structure to reflect the inputs and outputs?

    Right. Wait, are you talking about AI or humans?

  • Can a bunch of neurons firing based on chemical and electrical triggers intuit anything? It has to be the case that any intelligent process must be the emergent result of non-intelligent processes, because intelligence is not an inherent property of anything.

I think that “intuit the rules” is just projecting.

More likely, the 16 million games just has most of the piece move combinations. It does not know a knight moves in an L. It knows from each square where a knight can move based on 16 million games.

  • No this isn’t likely. Chess has trillions of possible games[1] that could be played and if it all it took was such a small number of games to hit most piece combinations chess would be solved. It has to have learned some fundamental aspects of the game to achieve the rating stated ITT

    1. https://en.m.wikipedia.org/wiki/Shannon_number#:~:text=After....

    • It doesn’t take the consumption of all trillions of possible game states to see a majority of possible ways a piece can move from one square to another.

      Maybe I misread something as I only skimmed, but the pretty weak Elo would most definitely suggest a failure of intuiting rules.

      5 replies →

  • On a board with a finite number of squares, is this truly different?

    The representation of the ruleset may not be the optimal Kolmogorov complexity - but for an experienced human player who can glance at a board and know what is and isn’t legal, who is to say that their mental representation of the rules is optimizing for Kolmogorov complexity either?

  • You assert something that is a hypothesis for further research in the area. Alternative is that it in fact knows that knights move in an L-shaped fashion. The article is about testing hypotheses like that, except this particular one seems quite hard.

    • It'd seem surprising to me if it had really learnt the generalization that knights move in an L-shaped fashion, especially since it's model of the board position seems to be more probabilistic than exact. We don't even know if it's representation of the board is spatial or not (e.g. that columns a & b are adjacent, or that rows 1 & 3 are two rows apart).

      We also don't know what internal representations of the state of play it's using other than what the author has discovered via probes... Maybe it has other representations effectively representing where pieces are (or what they may do next) other than just the board position.

      I'm guessing that it's just using all of it's learned representations to recognize patterns where, for example, Nf3 and Nh3 are both statistically likely, and has no spatial understanding of the relationship of these moves.

      I guess one way to explore this would be to generate a controlled training set where each knight only ever makes a different subset of it's legal (up to) 8 moves depending on which square it is on. Will the model learn a generalization that all L-shaped moves are possible from any square, or will it memorize the different subset of moves that "are possible" from each individual square?

      3 replies →