Comment by npinsker
6 days ago
The trouble is, getting an IMO gold medal is much easier (by frequency) than being the #1 Go player in the world, which was achieved by AI 10 years ago. I'm not sure it's enough to just gesture at the task; drilling down into precisely how it was achieved feels important.
(Not to take away from the result, which I'm really impressed by!)
The "AI" that won Go was Monte Carlo tree search on a neural net "memory" of the outcome of millions of previous games; this is a LLM solving open ended problems. The tasks are hardly even comparable.
A "reasoning LLM" might not be conceptually far from MCTS.
I really don't like the use of the word memory here, even in quotes. AlphaGo has a much better "understanding" of Go positions than mine (7k).
And then they created AlphaGo Zero, which is not trained on any previous games, and it was even stronger!
https://deepmind.google/discover/blog/alphago-zero-starting-...
AlphaGo Zero was also trained on millions of games, they just weren't games played by human players.
Nothing that uses a mathematical model for solving a problem will ever reason because reasoning can only be done by things we don't understand...