Evaluating Effects of Two Alternative Filters for the Incremental Pruning Algorithm on Quality of Pomdp Exact Solutions

Author(s)
Mahdi Naser-Moghadasi

ABSTRACT

Decision making is one of the central problems in artificial intelligence and specifically in robotics. In most cases this problem comes with uncertainty both in data received by the decision maker/agent and in the actions performed in the environment. One effective method to solve this problem is to model the environment and the agent as a Partially Observable Markov Decision Process (POMDP). A POMDP has a wide range of applications such as: Machine Vision, Marketing, Network troubleshooting, Medical diagnosis etc. In recent years, there has been a significant interest in developing techniques for finding policies for (POMDPs).We consider two new techniques, called Recursive Point Filter (RPF) and Scan Line Filter (SCF) based on Incremental Pruning (IP) POMDP solver to introduce an alternative method to Linear Programming (LP) filter for IP. Both, RPF and SCF have solutions for several POMDP problems that LP could not converge to in 24 hours. Experiments are run on problems from POMDP literature, and an Average Discounted Reward (ADR) is computed by testing the policy in a simulated environment.

Decision making is one of the central problems in artificial intelligence and specifically in robotics. In most cases this problem comes with uncertainty both in data received by the decision maker/agent and in the actions performed in the environment. One effective method to solve this problem is to model the environment and the agent as a Partially Observable Markov Decision Process (POMDP). A POMDP has a wide range of applications such as: Machine Vision, Marketing, Network troubleshooting, Medical diagnosis etc. In recent years, there has been a significant interest in developing techniques for finding policies for (POMDPs).We consider two new techniques, called Recursive Point Filter (RPF) and Scan Line Filter (SCF) based on Incremental Pruning (IP) POMDP solver to introduce an alternative method to Linear Programming (LP) filter for IP. Both, RPF and SCF have solutions for several POMDP problems that LP could not converge to in 24 hours. Experiments are run on problems from POMDP literature, and an Average Discounted Reward (ADR) is computed by testing the policy in a simulated environment.

Cite this paper

M. Naser-Moghadasi, "Evaluating Effects of Two Alternative Filters for the Incremental Pruning Algorithm on Quality of Pomdp Exact Solutions,"*International Journal of Intelligence Science*, Vol. 2 No. 1, 2012, pp. 1-8. doi: 10.4236/ijis.2012.21001.

M. Naser-Moghadasi, "Evaluating Effects of Two Alternative Filters for the Incremental Pruning Algorithm on Quality of Pomdp Exact Solutions,"

References

[1] G. E. Monahan, “A Survey of Partially Observable Markov Decision Processes,” Management Science, Vol. 28, No. 1, 1982, pp. 1-16. doi:10.1287/mnsc.28.1.1

[2] R. D. Smallwood and E. J. Sondik, “The Optimal Control of Partially Observable Markov Processes over a Finite Horizon,” Operations Research, Vol. 21, 1973, pp. 1071- 1088. doi:10.1287/opre.21.5.1071

[3] P. E. Caines, “Linear Stochastic Systems,” John Wiley, New York, 1988.

[4] M. T. J. Spaan, “Cooperative Active Perception Using POMDPs,” AAAI 2008 Workshop on Advancements in POMDP Solvers, 2008.

[5] J. Goldsmith and M. Mundhenk, “Complexity Issues in Markov Decision Processes,” Proceedings of the IEEE Conference on Computational Complexity, New York, 1998.

[6] J. Pineau, G. Gordon and S. Thrun, “Point-Based Value Iteration: An Anytime Algorithm for POMDPs,” Proceedings of the International Joint Conference on Artificial Intelligence, Acapulco, 2003.

[7] T. Smith and R. G. Simmons, “Heuristic Search Value Iteration for POMDPs,” Proceedings of the International Conference on Uncertainty in Artificial Intelligence, Banff, 2004.

[8] M. T. J. Spaan and N. Vlassis, “Randomized Point-Based Value Iteration for POMDPs,” Journal of Artificial Intelligence Research, Vol. 24, 2005, pp. 195-120.

[9] A. Cassandra, M. L. Littman and N. L. Zhang, “Incremental Pruning: A Simple, Fast, Exact Algorithm for Partially Observable Markov Decision Processes,” Proceedings of the 13th Annual Conference on Uncertainty in Artificial Intelligence, Brown, 1997.

[10] R. Moore, “A Formal Theory of Knowledge and Action,” In: J. Hobbs and R. Moore, Eds., Formal Theories of the Commonsense World, Norwood, 1985, pp. 319-358.

[11] A. R. Cassandra, L. P. Kaelbling and M. L. Littman, “Acting Optimally in Partially Observable Stochastic Domains,” Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, 1994.

[12] M. Littman, A. Cassandra and K. L. P, “Efficient Dynamic-Programming Updates in Partially Observable Markov Decision Process,” Brown University, Providence, 1996.

[13] L. D. Pyeatt and A. E. Howe, “A Parallel Algorithm for POMDP Solution,” Proceedings of the 5th European Conference on Planning, Durham, 1999, pp. 73-83.

[14] A. R. Cassandra, “Exact and Approximate Algorithms for Partially Observable Markov Decision Process,” PhD Thesis, Brown University, Brown, 1998.

[1] G. E. Monahan, “A Survey of Partially Observable Markov Decision Processes,” Management Science, Vol. 28, No. 1, 1982, pp. 1-16. doi:10.1287/mnsc.28.1.1

[2] R. D. Smallwood and E. J. Sondik, “The Optimal Control of Partially Observable Markov Processes over a Finite Horizon,” Operations Research, Vol. 21, 1973, pp. 1071- 1088. doi:10.1287/opre.21.5.1071

[3] P. E. Caines, “Linear Stochastic Systems,” John Wiley, New York, 1988.

[4] M. T. J. Spaan, “Cooperative Active Perception Using POMDPs,” AAAI 2008 Workshop on Advancements in POMDP Solvers, 2008.

[5] J. Goldsmith and M. Mundhenk, “Complexity Issues in Markov Decision Processes,” Proceedings of the IEEE Conference on Computational Complexity, New York, 1998.

[6] J. Pineau, G. Gordon and S. Thrun, “Point-Based Value Iteration: An Anytime Algorithm for POMDPs,” Proceedings of the International Joint Conference on Artificial Intelligence, Acapulco, 2003.

[7] T. Smith and R. G. Simmons, “Heuristic Search Value Iteration for POMDPs,” Proceedings of the International Conference on Uncertainty in Artificial Intelligence, Banff, 2004.

[8] M. T. J. Spaan and N. Vlassis, “Randomized Point-Based Value Iteration for POMDPs,” Journal of Artificial Intelligence Research, Vol. 24, 2005, pp. 195-120.

[9] A. Cassandra, M. L. Littman and N. L. Zhang, “Incremental Pruning: A Simple, Fast, Exact Algorithm for Partially Observable Markov Decision Processes,” Proceedings of the 13th Annual Conference on Uncertainty in Artificial Intelligence, Brown, 1997.

[10] R. Moore, “A Formal Theory of Knowledge and Action,” In: J. Hobbs and R. Moore, Eds., Formal Theories of the Commonsense World, Norwood, 1985, pp. 319-358.

[11] A. R. Cassandra, L. P. Kaelbling and M. L. Littman, “Acting Optimally in Partially Observable Stochastic Domains,” Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, 1994.

[12] M. Littman, A. Cassandra and K. L. P, “Efficient Dynamic-Programming Updates in Partially Observable Markov Decision Process,” Brown University, Providence, 1996.

[13] L. D. Pyeatt and A. E. Howe, “A Parallel Algorithm for POMDP Solution,” Proceedings of the 5th European Conference on Planning, Durham, 1999, pp. 73-83.

[14] A. R. Cassandra, “Exact and Approximate Algorithms for Partially Observable Markov Decision Process,” PhD Thesis, Brown University, Brown, 1998.