Comment by goatlover

2 years ago

What's the strength of play for the GPT architecture? It's impressive that it figures out the rules, but does it play strong chess?

>> As they say, attention may indeed be all you need.

I don't think drawing general conclusions about intelligence from a board game is warranted. We didn't evolve to play chess or Go.

> What's the strength of play for the GPT architecture?

Pretty shit for a computer. He says his 50m model reached 1800 Elo (by the way, its Elo and not ELO as the article incorrectly has it, it is named after a Hungarian guy called Elo). It seems to be a bit better than Stockfish level 1 and a bit worse than Stockfish level 2 from the bar graph.

Based on what we know I think its not surprising these models can learn to play chess, but they get absolutely smoked by a "real" chess bot like Stockfish or Leela.

  • Afaik his small bot reaches 1300 and gpt-3.5-instruct reaches 1800. We have no idea how much and on what kind of PGNs the Openai model was trained. I heard a rumor that they specifically trained on games up to 1800 before but no idea.

  • They also say “I left one training for a few more days and it reached 1500 ELO.” I find it quite likely the observed performance is largely limited by the spent compute.

I can't see it being superhuman, that's for sure. Chess AI are superhuman because they do vast searches, and I can't see that being replicated by an LLM architecture.

  • The apples-to-apples comparison would be comparing an LLM with Leela with search turned off (only using a single board state)

    According to figure 6b [0] removing MCTS reduces Elo by about 40%, scaling 1800 Elo by 5/3 gives us 3000 Elo which would be superhuman but not as good as e.g. LeelaZero.

    [0]: https://gwern.net/doc/reinforcement-learning/model/alphago/2...

    • Leela policy is around 2600 elo, or around the level of a strong grandmaster. Note that Go is different from chess since there are no draws, so skill difference is greatly magnified. Elo is always a relative scale (expected score is based on elo difference) so multiplication should not really make sense anyways.

    • I don’t think 3000 is superhuman though, it’s peak human as iirc magnus had an Elo of 3000 at one point

  • Any particular reason why that shouldn't work well with fine-tuning of an LLM using reinforcement learning?

  • Chess AI used to dominate by computational power but to my knowledge that is no longer true and the engines beat all but the very strongest players even when run on phone CPUs.

> What's the strength of play for the GPT architecture? It's impressive that it figures out the rules, but does it play strong chess?

sometimes it is not a matter of "is it better? is it larger? is it more efficient?", but just a question.

mountains are mountains, men are men.