SmartGo Blog • Wishful Thinking

Wishful Thinking

by Anders Kierulf

2016-03-13

Lee Sedol’s strategy in game 4 worked brilliantly (well explained in the excellent Go Game Guru commentary). It took AlphaGo from godlike play to kyu-level petulance. When it no longer saw a clear path to victory, it started playing moves that made no sense.

AlphaGo optimizes its chance of winning, not its margin of victory. As long as that chance of winning was good, this worked well. When the chance of winning dropped, AlphaGo’s quality of play fell precipitously. Why?

Ineffective threats

The bad moves that AlphaGo played include moves 87 and 161: threats that just don’t work, as they can easily be refuted, and either lose points, or at least reduce future opportunities. When AlphaGo plays such a move, it’s smart enough to find the correct local answer and figure out that the move doesn’t actually work. However, the Monte Carlo Tree Search component (MCTS) will also look at other moves that don’t answer that threat, as there is always a chance that the opponent plays elsewhere. Thus AlphaGo sees a non-zero chance that this threat actually works, and the way MCTS calculates the statistics it thinks that this increases its chance of winning.

Of course, the opposite is true. Playing a threat that can easily be refuted is just wishful thinking. The value network would figure out that such an exchange actually makes the position worse, but it doesn’t know that it should override the Monte Carlo simulations in this case.

Adjusting komi

One way to avoid this effect is to internally adjust the komi until the program has a good chance of winning. This causes the program to play what it thinks are winning moves, while in fact it will lose by the few points you artificially adjusted the score. If the opponent makes a mistake, the program might regain a real winning position later. (SmartGo uses this technique; it also helps play more reasonable moves in handicap games.)

For AlphaGo, that technique won’t work well: as I understand it, the value network is trained to recognize whether positions are good for Black or for White, not by how many points a player is ahead.

Known unknowns

Another idea is to look at the source of uncertainty in MCTS. The Monte Carlo winning percentages are based on statistics from the playouts, and there are many uncertainties in that process due to the random nature of the playouts and the limited nature of the search. The more moves you look at, the smaller the unknowns become, and the statistical methods used to figure out which moves to explore more deeply and how to back up results in the search tree try to minimize these uncertainties.

However, whether the opponent will answer a threat is a yes-or-no decision; it should not be treated like a statistical unknown. In that case, you want to back up the results in the tree using minimax, not percentages. Something for the DeepMind team to work on before they challenge Ke Jie, so AlphaGo won’t throw another tantrum.