Control Task for Reinforcement Learning with Known Optimal Solution for Discrete and Continuous Actions

ABSTRACT

The overall research in Reinforcement Learning (RL) concentrates on discrete sets of actions, but for certain real-world problems it is important to have methods which are able to find good strategies using actions drawn from continuous sets. This paper describes a simple control task called direction finder and its known optimal solution for both discrete and continuous actions. It allows for comparison of RL solution methods based on their value functions. In order to solve the control task for continuous actions, a simple idea for generalising them by means of feature vectors is presented. The resulting algorithm is applied using different choices of feature calculations. For comparing their performance a simple measure is introduced

The overall research in Reinforcement Learning (RL) concentrates on discrete sets of actions, but for certain real-world problems it is important to have methods which are able to find good strategies using actions drawn from continuous sets. This paper describes a simple control task called direction finder and its known optimal solution for both discrete and continuous actions. It allows for comparison of RL solution methods based on their value functions. In order to solve the control task for continuous actions, a simple idea for generalising them by means of feature vectors is presented. The resulting algorithm is applied using different choices of feature calculations. For comparing their performance a simple measure is introduced

Cite this paper

nullM. ROTTGER and A. LIEHR, "Control Task for Reinforcement Learning with Known Optimal Solution for Discrete and Continuous Actions,"*Journal of Intelligent Learning Systems and Applications*, Vol. 1 No. 1, 2009, pp. 28-41. doi: 10.4236/jilsa.2009.11002.

nullM. ROTTGER and A. LIEHR, "Control Task for Reinforcement Learning with Known Optimal Solution for Discrete and Continuous Actions,"

References

[1] J. A. Boyan and A. W. Moore, “Generalization in rein-forcement learning: Safely approximating the value function,” in Advances in Neural Information Processing Systems 7, The MIT Press, pp. 369-376, 1995.

[2] R. S. Sutton, “Generalization in reinforcement learning: Successful examples using sparse coarse coding,” in Advances in Neural Information Processing Systems, edited by David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, The MIT Press, Vol. 8, pp. 1038-1044, 1996.

[3] W. D. Smart and L. P. Kaelbling, “Practical reinforce-ment learning in continuous spaces,” in ICML’00: Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., pp. 903-910, 2000.

[4] M. G. Lagoudakis, R. Parr, and M. L. Littman, “Least- squares methods in reinforcement learning for control,” in Proceedings of Methods and Applications of Artificial Intelligence: Second Hellenic Conference on AI, SETN 2002, Thessaloniki, Greece, Springer, pp. 752-752, April 11-12, 2002.

[5] K. Doya, “Reinforcement learning in continuous time and space,” Neural Computation, Vol. 12, pp. 219-245, 2000.

[6] P. Wawrzyński and A. Pacut, “Model-free off-policy reinforcement learning in continuous environment,” in Proceedings of the INNS-IEEE International Joint Conference on Neural Networks, pp. 1091-1096, 2004.

[7] J. Morimoto and K. Doya, “Robust reinforcement learning,” Neural Computation, Vol. 17, pp. 335-359, 2005.

[8] G. Boone, “Efficient reinforcement learning: Model- based acrobot control,” in International Conference on Robotics and Automation, pp. 229-234, 1997.

[9] X. Xu, D. W. Hu, and X. C. Lu, “Kernel-based least squares policy iteration for reinforcement learning,” IEEE Transactions on Neural Networks, Vol. 18, pp. 973-992, 2007.

[10] J. C. Santamaría, R. S. Sutton, and A. Ram, “Experiments with reinforcement learning in problems with continuous state and action spaces,” Adaptive Behavior, Vol. 6, pp. 163-217, 1997.

[11] J. D. R. Millán and C. Torras, “A reinforcement connec-tionist approach to robot path finding in non-maze-like environments,” Machine Learning, Vol. 8, pp. 363-395, 1992.

[12] T. Fukao, T. Sumitomo, N. Ineyama, and N. Adachi, “Q-learning based on regularization theory to treat the continuous states and actions,” in the 1998 IEEE International Joint Conference on Neural Networks Proceedings, IEEE World Congress on Computational Intelligence, Vol. 2, pp. 1057-1062, 1998.

[13] R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction of adaptive computation and machine learning,” The MIT Press, March 1998.

[14] L. C. Baird and A. H. Klopf, “Reinforcement learning with high-dimensional, continuous actions,” Technical Report, WL-TR-93-1147, Wright-Patterson Air Force Base Ohio: Wright Laboratory, 1993.

[15] J. D. R. Millán, D. Posenato, and E. Dedieu, “Continu-ous-action q-learning,” Machine Learning, Vol. 49, pp. 247-265, 2002.

[16] H. Arie, J. Namikawa, T. Ogata, J. Tani, and S. Sugano. “Reinforcement learning algorithm with CTRNN in con-tinuous action space,” in Proceedings of Neural Informa-tion Processing, Part 1, Vol. 4232, pp. 387-396, 2006.

[17] R. M. Kretchmar and C. W. Anderson, “Comparison of CMACS and radial basis functions for local function approximators in reinforcement learning,” in International Conference on Neural Networks, pp. 834-837, 1997.

[18] R. Dembo and T. Steihaug, “Truncated-Newton algorithms for large-scale unconstrained optimization,” Mathematical Programming, Vol. 26, pp.190-212, 1983.

[19] R. S. Sutton, “Learning to predict by the methods of temporal differences,” Machine Learning, Vol. 3, pp. 9-44, 1988.

[1] J. A. Boyan and A. W. Moore, “Generalization in rein-forcement learning: Safely approximating the value function,” in Advances in Neural Information Processing Systems 7, The MIT Press, pp. 369-376, 1995.

[2] R. S. Sutton, “Generalization in reinforcement learning: Successful examples using sparse coarse coding,” in Advances in Neural Information Processing Systems, edited by David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, The MIT Press, Vol. 8, pp. 1038-1044, 1996.

[3] W. D. Smart and L. P. Kaelbling, “Practical reinforce-ment learning in continuous spaces,” in ICML’00: Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., pp. 903-910, 2000.

[4] M. G. Lagoudakis, R. Parr, and M. L. Littman, “Least- squares methods in reinforcement learning for control,” in Proceedings of Methods and Applications of Artificial Intelligence: Second Hellenic Conference on AI, SETN 2002, Thessaloniki, Greece, Springer, pp. 752-752, April 11-12, 2002.

[5] K. Doya, “Reinforcement learning in continuous time and space,” Neural Computation, Vol. 12, pp. 219-245, 2000.

[6] P. Wawrzyński and A. Pacut, “Model-free off-policy reinforcement learning in continuous environment,” in Proceedings of the INNS-IEEE International Joint Conference on Neural Networks, pp. 1091-1096, 2004.

[7] J. Morimoto and K. Doya, “Robust reinforcement learning,” Neural Computation, Vol. 17, pp. 335-359, 2005.

[8] G. Boone, “Efficient reinforcement learning: Model- based acrobot control,” in International Conference on Robotics and Automation, pp. 229-234, 1997.

[9] X. Xu, D. W. Hu, and X. C. Lu, “Kernel-based least squares policy iteration for reinforcement learning,” IEEE Transactions on Neural Networks, Vol. 18, pp. 973-992, 2007.

[10] J. C. Santamaría, R. S. Sutton, and A. Ram, “Experiments with reinforcement learning in problems with continuous state and action spaces,” Adaptive Behavior, Vol. 6, pp. 163-217, 1997.

[11] J. D. R. Millán and C. Torras, “A reinforcement connec-tionist approach to robot path finding in non-maze-like environments,” Machine Learning, Vol. 8, pp. 363-395, 1992.

[12] T. Fukao, T. Sumitomo, N. Ineyama, and N. Adachi, “Q-learning based on regularization theory to treat the continuous states and actions,” in the 1998 IEEE International Joint Conference on Neural Networks Proceedings, IEEE World Congress on Computational Intelligence, Vol. 2, pp. 1057-1062, 1998.

[13] R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction of adaptive computation and machine learning,” The MIT Press, March 1998.

[14] L. C. Baird and A. H. Klopf, “Reinforcement learning with high-dimensional, continuous actions,” Technical Report, WL-TR-93-1147, Wright-Patterson Air Force Base Ohio: Wright Laboratory, 1993.

[15] J. D. R. Millán, D. Posenato, and E. Dedieu, “Continu-ous-action q-learning,” Machine Learning, Vol. 49, pp. 247-265, 2002.

[16] H. Arie, J. Namikawa, T. Ogata, J. Tani, and S. Sugano. “Reinforcement learning algorithm with CTRNN in con-tinuous action space,” in Proceedings of Neural Informa-tion Processing, Part 1, Vol. 4232, pp. 387-396, 2006.

[17] R. M. Kretchmar and C. W. Anderson, “Comparison of CMACS and radial basis functions for local function approximators in reinforcement learning,” in International Conference on Neural Networks, pp. 834-837, 1997.

[18] R. Dembo and T. Steihaug, “Truncated-Newton algorithms for large-scale unconstrained optimization,” Mathematical Programming, Vol. 26, pp.190-212, 1983.

[19] R. S. Sutton, “Learning to predict by the methods of temporal differences,” Machine Learning, Vol. 3, pp. 9-44, 1988.