AlphaZero's black box is opened! DeepMind paper published in PNAS-AI-php.cn

Chess has always been a proving ground for AI. 70 years ago, Alan Turing hypothesized that it would be possible to build a chess-playing machine that could learn on its own and continually improve from its own experience. “Deep Blue” that appeared in the last century defeated humans for the first time, but it relied on experts to encode human chess knowledge. AlphaZero, which was born in 2017, realized Turing’s conjecture as a neural network-driven reinforcement learning machine.

AlphaZero does not need to use any artificially designed heuristics or watch humans play chess, but is trained entirely by playing against itself.

So, has it really learned human concepts about chess? This is a neural network interpretability problem.

In this regard, AlphaZero author Demis Hassabis collaborated with colleagues at DeepMind and researchers from Google Brain on a study to find evidence of human chess concepts in AlphaZero’s neural network. We show when and where in the training process the network acquires these concepts, and also discover that AlphaZero's chess-playing style differs from humans. The paper was recently published in PNAS.

AlphaZeros black box is opened! DeepMind paper published in PNAS

Paper address: https://www.pnas.org/doi/epdf/10.1073/pnas.2206625119

AlphaZero obtains human chess concepts in training

AlphaZero’s network architecture includes a backbone network residual network (ResNet) and separate Policy Head and Value Head. ResNet is composed of A series of layers consisting of network blocks and skip connections.

In terms of training iterations, AlphaZero starts with a neural network with randomly initialized parameters and repeatedly plays against itself, learning to evaluate the position of the pieces and performing multiple tests based on the data generated in the process. training times.

To determine the extent to which the AlphaZero network represents the concepts of chess possessed by humans, this study used a sparse linear probing method to map changes in the parameters of the network during training Changes in concepts that are understandable to humans.

Start by defining the concept as a user-defined function as shown in orange in Figure 1. The generalized linear function g is trained as a probe to approximate a chess concept c. The quality of the approximation g indicates how well the layer (linearly) encodes the concept. For a given concept, the process is repeated for the network sequences produced during the training process for all layers in each network.

AlphaZeros black box is opened! DeepMind paper published in PNAS

Figure 1: Exploring human-encoded chess concepts in the AlphaZero network (blue).

For example, you can use a function to determine whether there is a "Bishop" in our country or place (♗):

AlphaZeros black box is opened! DeepMind paper published in PNAS

Of course, there are many more complex chess concepts than this example. For example, for the mobility of chess pieces, you can write a function to compare the scores of our and the enemy's moving pieces.

In this experiment, the concept function has been specified in advance and encapsulates the knowledge of the specific field of chess.

The next step is to train the probe. The researchers used 10^5 naturally occurring chess positions in the ChessBase dataset as a training set and trained a sparse regression probe g from a network activation of depth d to predict the value of a given concept c.

By comparing the networks at different training steps in the AlphaZero self-learning cycle and the scores of different concept probes at different layers in each network, we can extract the information that the network has learned about a certain concept. time and location.

Finally, we get the what-when-where diagram of each concept, which is about "what is the concept being calculated", "where does the calculation occur in the network", and "when does the concept appear during network training" Visualize these three indicators. As shown in Figure 2.

AlphaZeros black box is opened! DeepMind paper published in PNAS

Figure 2: The concepts from A to B are "evaluation of the total score" and "Have we been defeated?" ", "Assessment of threats", "Can we capture the enemy's queen", "Will the enemy's move kill us", "Evaluation of pieces' score", "Score of pieces" , "Do we have royal soldiers on our side?"

It can be seen that in the C diagram, as AlphaZero becomes stronger, the function of the "threats" concept and the representation of AlphaZero (which can be detected by the linear probe) change becomes increasingly irrelevant.

Such a what-when-where plot includes the two baselines required for comparison of detection methods, one is the input regression, shown at layer 0, and the other is from the network with random weights Activated regression, shown at training step 0. From the results in the above figure, it can be concluded that changes in regression accuracy are entirely determined by changes in network representation.

In addition, the results of many what-when-where graphs show the same pattern, that is, the regression accuracy of the entire network is very low until about 32k steps, and then it starts to increase. It increases rapidly with increasing network depth, then stabilizes and remains unchanged in subsequent layers. So, all concept-related computations happen relatively early in the network, while subsequent residual blocks either perform move selection or compute features outside the given set of concepts.

Moreover, as training proceeds, many human-defined concepts can be predicted from AlphaZero’s representations with high prediction accuracy.

For more advanced concepts, the researchers found differences in where AlphaZero mastered them. First, the concepts that are significantly different from zero at 2k training steps are "material" and "space"; more complex concepts such as "king_safety", "threats", and "mobility" are significantly different from zero at 8k training steps. Zero, and does not increase substantially until after 32k training steps. This result is consistent with the sharply rising point shown by the what-when-where plot in Figure 2. AlphaZeros black box is opened! DeepMind paper published in PNAS

In addition, a notable feature of most what-when-where graphs is that the regression accuracy of the network increases rapidly in the beginning and then reaches a plateau or decreases. This suggests that the set of concepts discovered so far from AlphaZero only detects earlier layers of the network, and that understanding later layers requires new concept detection techniques.

AlphaZero’s opening strategy is different from humans

After observing that AlphaZero learned human chess concepts, the researchers further explored AlphaZero’s understanding of chess tactics in terms of opening strategies. Understanding, because the choice of opening also implies the player's understanding of related concepts.

AlphaZeros black box is opened! DeepMind paper published in PNAS

The researchers observed that AlphaZero had different opening strategies than humans: Over time, AlphaZero narrowed its options, while humans It is to expand the range of choices.

Figure 3A shows the historical evolution of human preference for the first move of white. In the early stages, it was popular to use e4 as the first move. Later, the opening strategy became more balanced and More flexible.

Figure 3B shows the evolution of AlphaZero’s opening strategy along with the training steps. As you can see, AlphaZero always starts by weighing all options equally and then gradually narrows down the options.

AlphaZeros black box is opened! DeepMind paper published in PNAS

Figure 3: Comparison of AlphaZero and human preferences for the first step over training steps and time.

This is in sharp contrast to the evolution of human knowledge, which gradually expands starting from e4, while AlphaZero clearly favors d4 in the later stages of training. This preference does not need to be over-explained, however, as self-play training is based on quick play with a lot of randomness added to facilitate exploration.

The reason for this difference is unclear, but it reflects a fundamental difference between humans and artificial neural networks. One possible factor may be that historical data on human chess emphasizes the collective knowledge of master players, whereas AlphaZero's data includes beginner-level chess play and a single evolved strategy.

So, when AlphaZero’s neural network is trained multiple times, will it show a stable preference for certain opening strategies?

The research result is that in many cases, this preference is not stable in different trainings, and AlphaZero’s opening strategies are very diverse. For example, in the classic Ruy Lopez opening (commonly known as the "Spanish opening"), AlphaZero has a preference for choosing black in the early stage and follows the typical play method, namely 1.e4 e5, 2.Nf3 Nc6, 3.Bb5.

AlphaZeros black box is opened! DeepMind paper published in PNAS

Figure 4: Ruy Lopez starts

And in different training , AlphaZero will gradually converge to one of 3.f6 and 3.a6. Additionally, different versions of the AlphaZero model each showed a strong preference for one action over another, and this preference was established early in training.

This is further evidence that successful chess plays are diverse, not only between humans and machines, but also across different training iterations of AlphaZero.

AlphaZero’s process of mastering knowledge

So, what is the connection between the above research results on opening strategies and AlphaZero’s understanding of concepts?

This study found that there is a clear inflection point in the what-when-where graphs of various concepts, which coincides with significant changes in opening preferences, especially material and mobility. The concept of seems directly related to opening strategy.

The material concept is mainly learned between training steps 10k and 30k, and the concept of piece mobility is gradually integrated into AlphaZero’s value head during the same period. A basic understanding of the material value of chess pieces should precede an understanding of chess piece mobility. AlphaZero then incorporated this theory into opening preferences between 25k and 60k training steps.

The author further analyzed the evolution of the AlphaZero network's knowledge about chess: first the discovery of chess power; then the explosive growth of basic knowledge in a short time window, mainly related to mobility Some concepts; and finally the refinement phase, where the neural network's opening strategy is refined over hundreds of thousands of training steps. Although the overall learning time is long, specific basic abilities emerge quickly in a relatively short period of time.

Former world chess champion Vladimir Kramnik was also brought in to provide support for this conclusion, and his observations were consistent with the process described above.

Finally, this work demonstrates that the chessboard representation learned by the AlphaZero network can reconstruct many human chess concepts, and details the conceptual content learned by the network, at training time the time it took to learn a concept and the network location of the computed concept. Moreover, AlphaZero’s chess-playing style is not the same as humans.

Now that we understand neural networks in terms of human-defined chess concepts, the next question will be: Can neural networks learn things beyond human knowledge?

The above is the detailed content of AlphaZero's black box is opened! DeepMind paper published in PNAS. For more information, please follow other related articles on the PHP Chinese website!