L. P. Kaelbling, M. L. Littman and A. W. Moore, “Rein-forcement Learning: A Survey,” Journal of Artificial In-telligence Research, Vol. 4, 1996, pp. 237-285.
R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” The MIT press, Cambridge MA, 1998.
R. S. Sutton, “Learning to Predict by the Methods of Temporal Differences,” Machine Learning, Vol. 3, 1988, pp. 9-44.
J. B. Pollack and A. D. Blair, “Why Did TD-Gammon Work,” In: D. S. Touretzky, M. C. Mozer and M. E. Has-selmo, Ed., Advances in Neural Information Processing Systems 8, MIT Press, Cambridge MA, 1996, pp. 10-16.
D. B. Fogel, “Evolving a Checkers Player without Rely-ing on Human Experience,” Intelligence, Vol. 11, No. 2, 2000, p. 217.
D. E. Moriarty, “Symbiotic Evolution of Neural Net-works in Sequential Decision Tasks,” PhD thesis, De-partment of Computer Sciences, The University of Texas at Austin, USA, 1997.
G. Tesauro, “Practical Issues in Temporal Difference Learning,” In: D. S. Lippman, J. E. Moody and D. S. Touretzky, Ed., Advances in Neural Information Proc-essing Systems 4, Morgan Kaufmann, San Mateo, CA, 1992, pp. 259-266.
G. J. Tesauro, “Temporal Difference Learning and TD-Gammon,” Communications of the ACM, Vol. 38, 1995, pp. 58-68.
S. Thrun, “Learning to Play the Game of Chess,” In: G. Te-sauro, D. Touretzky and T. Leen, Ed., Advances in Neural Information Processing Systems 7, Morgan Kaufmann, San Fransisco, CA, 1995, pp. 1069-1076.
J. Baxter, A. Tridgell and L. Weaver, “Knightcap: A Chess Program that Learns by Combining TD(λ) with Minimax Search,” Technical report, Australian National University, Canberra, 1997.
A. L. Samuel, “Some Studies in Machine Learning Using the Game of Checkers,” IBM Journal on Research and Development, Vol. 3, No. 3, 1959, pp. 210-229.
A. L. Samuel, “Some Studies in Machine Learning Using the Game of Checkers II—Recent Progress,” IBM Jour-nal on Research and Development, Vol. 11, No. 6, 1967, pp. 601-617.
J. Schaeffer, M. Hlynka and V. Hussila, “Temporal Dif-ference Learning Applied to a High-Performance Game,” In Seventeenth International Joint Conference on Artifi-cial Intelligence, Seattle, WA, USA, 2001, pp. 529-534.
N. N. Schraudolph, P. Dayan and T. J. Sejnowski, “Tem-poral Difference Learning of Position Evaluation in the Game of Go,” In: J. D. Cowan, G. Tesauro and J. Al-spector, Ed., Advances in Neural Information Processing Systems, Morgan Kaufmann, San Francisco, CA, 1994, pp. 817-824.
J. Furnkranz, “Machine Learning in Games: A Survey,” In: J. Furnkranz and M. Kubat, Ed., Machines that learn to Play Games, Nova Science Publishers, Huntington,NY, 2001, pp. 11-59.
A. Plaat, “Research Re: search and Re-search,” PhD the-sis, Erasmus University Rotterdam, Holland, 1996.
J. Schaeffer, “The Games Computers (and People) Play,” Advances in Computers, Vol. 50, 2000, pp. 189-266.
J. Schaeffer and A. Plaat, “Kasparov versus Deep Blue: The Re-match,” International Computer Chess Association Journal, Vol. 20, No. 2, 1997, pp. 95-102.
R. Coulom, “Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search,” Proceedings of the fifth International Conference on Computers and Games, Turin, Italy, 2006, pp. 72-83.
R. Bellman, “Dynamic Programming,” Princeton Univer-sity Press, USA, 1957.
J. N. Tsitsiklis, “Asynchronous Stochastic Approximation and Q-learning,” Machine Learning, Vol. 16, 1994, pp. 185-202.
A. G. Barto, R. S. Sutton and C. W. Anderson, “Neu-ronlike Adaptive Elements that Can Solve Difficult Learning Control Problems,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 13, 1983, pp. 834-846.
M. A. Wiering and J. H. Schmidhuber, “Fast Online Q(λ),” Machine Learning, Vol. 33, No. 1, 1998, pp. 105- 116.
C. M. Bishop, “Neural Networks for Pattern Recognition,” Oxford University, New York, 1995.
D. E. Rumelhart, G. E. Hinton and R. J. Williams, “Learning Internal Representations by Error Propagation,” In: D. E. Rumelhart and J. L. Mcclelland, Ed., Parallel Distributed Processing, MIT Press, USA, 1986, pp. 318-362.
L.-J. Lin, “Reinforcement Learning for Robots Using Neural Networks,” PhD thesis, Carnegie Mellon Univer-sity, Pittsburgh, 1993.
H. Berliner, “Experiences in Evaluation with BKG—A Program that Plays Backgammon,” In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 1, 1977, pp. 428-433.
J. A. Boyan, “Modular Neural Networks for Learning Context-Dependent Game Strategies,” Master’s thesis, University of Chicago, USA, 1992.
R. Caruana and V. R. de Sa, “Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs,” In M. C. Mozer, M. I. Jordan and T. Petsche, Ed., Advances in Neural Information Processing Systems 9, Morgan Kauf-mann, San Mateo, CA, 1997, pp.246-252.
A. Sperduti and A. Starita, “Speed up Learning and Network Optimization with Extended Backpropagation,” Neural Networks, Vol. 6, 1993, pp. 365-383.
X. Xu, D. Hu and X. Lu, “Kernel-Based Least Squares Policy Iteration for Reinforcement Learning,” IEEE Transactions on Neural Networks, Vol. 18, No. 4, 2007, pp. 973-992.
W. Smart and L. Kaelbling, “Effective Reinforcement Learning for Mobile Robots,” Proceedings of the IEEE International Conference on Robotics and Automation, Washington, DC, USA, 2002, pp. 3404-3410.