Back
 JILSA  Vol.10 No.2 , May 2018
Prediction Distortion in Monte Carlo Tree Search and an Improved Algorithm
Abstract: < ="font-family:Verdana;">Teaching computer programs to play games through machine learning has been an important way to achieve better artificial intelligence (AI) in a variety of real-world applications. Monte Carlo Tree Search (MCTS) is one of the key AI techniques developed recently that enabled AlphaGo to defeat a legendary professional Go player. What makes MCTS particularly attractive is that it only understands the basic rules of the game and does not rely on expert-level knowledge. Researchers thus expect that MCTS can be applied to other complex AI problems where domain-specific expert-level knowledge is not yet available. So far there are very few analytic studies in the literature. In this paper, our goal is to develop analytic studies of MCTS to build a more fundamental understanding of the algorithms and their applicability in complex AI problems. We start with a simple version of MCTS, called random playout search (RPS), to play Tic-Tac-Toe, and find that RPS may fail to discover the correct moves even in a very simple game position of Tic-Tac-Toe. Both the probability analysis and simulation have confirmed our discovery. We continue our studies with the full version of MCTS to play Gomoku and find that while MCTS has shown great success in playing more sophisticated games like Go, it is not effective to address the problem of sudden death/win. The main reason that MCTS often fails to detect sudden death/win lies in the random playout search nature of MCTS, which leads to prediction distortion. Therefore, although MCTS in theory converges to the optimal minimax search, with real world computational resource constraints, MCTS has to rely on RPS as an important step in its search process, therefore suffering from the same fundamental prediction distortion problem as RPS does. By examining the detailed statistics of the scores in MCTS, we investigate a variety of scenarios where MCTS fails to detect sudden death/win. Finally, we propose an improved MCTS algorithm by incorporating minimax search to overcome prediction distortion. Our simulation has confirmed the effectiveness of the proposed algorithm. We provide an estimate of the additional computational costs of this new algorithm to detect sudden death/win and discuss heuristic strategies to further reduce the search complexity.
Cite this paper: Li, W. (2018) Prediction Distortion in Monte Carlo Tree Search and an Improved Algorithm. Journal of Intelligent Learning Systems and Applications, 10, 46-79. doi: 10.4236/jilsa.2018.102004.
References

[1]   Schaeffer, J., Müller, M. and Kishimoto, A. (2014) Go-Bot, Go. IEEE Spectrum, 51, 48-53.
https://doi.org/10.1109/MSPEC.2014.6840803

[2]   https://en.wikipedia.org/wiki/Computer_Go

[3]   Nandy, A. and Biswas, M. (2018) Google’s DeepMind and the Future of Reinforcement Learning. In: Reinforcement Learning, Apress, Berkeley, 155-163.
https://doi.org/10.1007/978-1-4842-3285-9_6

[4]   Silver, D., et al. (2017) Mastering the Game of Go without Human Knowledge. Nature, 550, 354-359.
https://doi.org/10.1038/nature24270

[5]   Coulom, R. (2006) Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In: van den Herik, H.J., Ciancarini, P. and Donkers, H.H.L.M., Eds., Computers and Games. CG 2006. Lecture Notes in Computer Science, vol 4630, Springer, Berlin, Heidelberg, 72-83.

[6]   Kocsis, L. and Szepesvári, C. (2006) Bandit Based Monte-Carlo Planning. In: Fürnkranz, J., Scheffer, T. and Spiliopoulou, M., Eds., Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science, vol 4212, Springer, Berlin, Heidelberg, 282-293.
https://doi.org/10.1007/11871842_29

[7]   Chaslot, G.M.J-B., Saito, J.-T., Bouzy, B., Uiterwijk, J.W.H.M. and van den Herik, H.J. (2006) Monte-Carlo Strategies for Computer Go. Proceedings of the 18th BeNeLux Conference on Artificial Intelligence, Belgium, 2006, 83-90.

[8]   Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samthrakis, S. and Colton, S. (2012) A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games, 4, 1-49.
https://doi.org/10.1109/TCIAIG.2012.2186810

[9]   Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002) Finite-Time Analysis of the Multi-Armed Bandit Problem. Machine Learning, 47, 235-256.
https://doi.org/10.1023/A:1013689704352

[10]   Chang, H., Fu, M., Hu, J. and Marcus, S. (2005) An Adaptive Sampling Algorithm for Solving Markov Decision Processes. Operations Research, 53, 126-139.
https://doi.org/10.1287/opre.1040.0145

[11]   Ramanujan, R., Sabharval, A. and Selman, B. (2010) On Adversarial Search Spaces and Sampling-Based Planning. Proceedings of the 20th International Conference on Automated Planning and Scheduling, Toronto, 12-16 May 2010, 242-245.

[12]   Browne, C. (2011) The Dangers of Random Playouts. ICGA Journal, 34, 25-26.
https://doi.org/10.3233/ICG-2011-34105

[13]   Teytaud, F. and Teytaud, O. (2010) On the Huge Benefit of Decisive Moves in Monte-Carlo Tree Search Algorithms. Proceedings of IEEE Conference on Computational Intelligence and Games, Copenhagen, 8-21 August 2010, 359-364.
https://doi.org/10.1109/ITW.2010.5593334

 
 
Top