← Back to context

Comment by cgearhart

7 years ago

AlphaZero uses a neural network to represent the probability of winning a game in a given state and a probability distribution of next moves, but it still “just” uses Monte-Carlo Tree Search (MCTS) to look for the strongest position to play based on the estimated score for each possible state. In that way it is identical to earlier agents whether using minimax, negamax/principal variation search, or MCTS. The primary improvements of AlphaZero are learning without bootstrapping from human play & using the neural network to output the probability distribution over the available moves in each state to guide the search rather than using a static heuristic (like killer move or UCB, etc).

The original AlphaGo paper even mentioned that they tested the bare neural network predictions against the version with MCTS guided by the network and found that the MCTS version won 100% of the time, which strongly suggests that search is an indispensable part of strong AI performance in games.