IJIS  Vol.5 No.5 , October 2015
Compound Hidden Markov Model for Activity Labelling
ABSTRACT
This research presents a novel way of labelling human activities from the skeleton output computed from RGB-D data from vision-based motion capture systems. The activities are labelled by means of a Compound Hidden Markov Model. The linkage of several Linear Hidden Markov Models to common states, makes a Compound Hidden Markov Model. Each separate Linear Hidden Markov Model has motion information of a human activity. The sequence of most likely states, from a sequence of observations, indicates which activities are performed by a person in an interval of time. The purpose of this research is to provide a service robot with the capability of human activity awareness, which can be used for action planning with implicit and indirect Human-Robot Interaction. The proposed Compound Hidden Markov Model, made of Linear Hidden Markov Models per activity, labels activities from unknown subjects with an average accuracy of 59.37%, which is higher than the average labelling accuracy for activities of unknown subjects of an Ergodic Hidden Markov Model (6.25%), and a Compound Hidden Markov Model with activities modelled by a single state (18.75%).

Cite this paper
Figueroa-Angulo, J. , Savage, J. , Bribiesca, E. , Escalante, B. and Sucar, L. (2015) Compound Hidden Markov Model for Activity Labelling. International Journal of Intelligence Science, 5, 177-195. doi: 10.4236/ijis.2015.55016.
References
[1]   Aggarwal, J. and Ryoo, M. (2011) Human Activity Analysis: A Review. ACM Computing Surveys, 43, 16:1-16:43.
http://dx.doi.org/10.1145/1922649.1922653

[2]   Bobick, A. and Davis, J. (2001) The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 257-267.
http://dx.doi.org/10.1109/34.910878

[3]   Ke, Y., Sukthankar, R. and Hebert, M. (2007) Spatio-Temporal Shape and Flow Correlation for Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, 17-22 June 2007, 1-8.
http://dx.doi.org/10.1109/cvpr.2007.383512

[4]   Shechtman, E. and Irani, M. (2005) Space-Time Behavior Based Correlation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, 20-25 June 2005, 405-412.
http://dx.doi.org/10.1109/cvpr.2005.328

[5]   Campbell, L. and Bobick, A. (1995) Recognition of Human Body Motion Using Phase Space Constraints. 5th International Conference on Computer Vision, Cambridge, 20-23 June 1995, 624-630.
http://dx.doi.org/10.1109/ICCV.1995.466880

[6]   Rao, C. and Shah, M. (2001) View-Invariance in Action Recognition. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, 8-14 December 2001, II-316-II-322.
http://dx.doi.org/10.1109/cvpr.2001.990977

[7]   Sheikh, Y., Sheikh, M. and Shah, M. (2005) Exploring the Space of a Human Action. Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, 15-21 October 2005, 144-149.
http://dx.doi.org/10.1109/iccv.2005.90

[8]   Ryoo, M.S. and Aggarwal, J. (2009) Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities. Proceedings of the 12th IEEE International Conference on Computer Vision, Kyoto, 27 September-4 October 2009, 1593-1600.

[9]   Wong, K.Y.K., Kim, T.-K. and Cipolla, R. (2007) Learning Motion Categories Using Both Semantic and Structural Information. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, 18-23 June 2007, 1-6.
http://dx.doi.org/10.1109/cvpr.2007.383332

[10]   Yilma, A. and Shah, M. (2005) Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras. Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, 15-21 October 2005, 150-157.
http://dx.doi.org/10.1109/iccv.2005.201

[11]   Vintsyuk, T. (1968) Speech Discrimination by Dynamic Programming. Cybernetics, 4, 52-57.
http://dx.doi.org/10.1007/BF01074755

[12]   Darrell, T. and Pentland, A. (1993) Space-Time Gestures. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 335-340.
http://dx.doi.org/10.1109/cvpr.1993.341109

[13]   Gavrila, D. and Davis, L. (1996) 3-D Model-Based Tracking of Humans in Action: A Multi-View Approach. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 18-20 June 1996, 73-80.

[14]   Yacoob, Y. and Black, M. (1998) Parameterized Modeling and Recognition of Activities. Proceedings of the Sixth International Conference on Computer Vision, Bombay, 7 January 1998, 120-127.
http://dx.doi.org/10.1109/iccv.1998.710709

[15]   Rabiner, L.R. (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77, 257-286.
http://dx.doi.org/10.1109/5.18626

[16]   Rabiner, L. and Juang, B.H. (1993) Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs.

[17]   Fink, G.A. (2007) Markov Models for Pattern Recognition: From Theory to Applications. Springer E-Books.

[18]   Magee, D.R. and Boyle, R.D. (2002) Detecting Lameness Using “Re-Sampling Condensation” and “Multi-Stream Cyclic Hidden Markov Models”. Image and Vision Computing, 20, 581-594.
http://dx.doi.org/10.1016/S0262-8856(02)00047-1

[19]   Chen, H.-S., Chen, H.-T., Chen, Y.-W. and Lee, S.-Y. (2006) Human Action Recognition Using Star Skeleton. Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks, New York, 171-178.
http://dx.doi.org/10.1145/1178782.1178808

[20]   Starner, T.E. and Pentland, A. (1995) Visual Recognition of American Sign Language Using Hidden Markov Models. Proceedings of the International Workshop on Automatic Face-and Gesture-Recognition, Zurich, 26-28 June 1995.

[21]   Sung, J., Ponce, C., Selman, B. and Saxena, A. (2012) Unstructured Human Activity Detection from RGBD Images. Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, 14-18 May 2012, 842-849.

[22]   Xia, L., Chen, C.-C. and Aggarwal, J. (2012) View Invariant Human Action Recognition Using Histograms of 3D Joints. Proceedings of the 2nd International Workshop on Human Activity Understanding from 3D Data (HAU3D), Providence, 16-21 June 2012.

[23]   Yamato, J., Ohya, J. and Ishii, K. (1992) Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Champaign, 15-18 June 1992, 379-385.
http://dx.doi.org/10.1109/cvpr.1992.223161

[24]   Bobick, A., Ivanov, Y., Bobick, A.F. and Ivanov, Y.A. (1998) Action Recognition Using Probabilistic Parsing. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Santa Barbara, 23-25 June 1998, 196-202.
http://dx.doi.org/10.1109/cvpr.1998.698609

[25]   Nergui, M., Yoshida, Y., Imamoglu, N., Gonzalez, J. and Yu, W. (2012) Human Behavior Recognition by a Bio-\Monitoring Mobile Robot. In: Proceedings of the 5th International Conference on Intelligent Robotics and Applications—Volume Part II, Springer-Verlag, Berlin, Heidelberg, 21-30.
http://dx.doi.org/10.1007/978-3-642-33515-0_3

[26]   Oh, C.-M., Islam, M.Z., Park, J.-W. and Lee, C.-W. (2010) A Gesture Recognition Interface with Upper Body Model-Based Pose Tracking. Proceedings of the 2nd International Conference on Computer Engineering and Technology, Chengdu, 16-18 April 2010, V7-531-V7-534.
http://dx.doi.org/10.1109/iccet.2010.5485583

[27]   Yu, E. and Aggarwal, J.K. (2006) Detection of Fence Climbing from Monocular Video. In: Proceedings of the 18th International Conference on Pattern Recognition, IEEE Computer Society, Washington DC, 375-378.
http://dx.doi.org/10.1109/icpr.2006.440

[28]   Zhang, D., Gatica-Perez, D., Bengio, S. and McCowan, I. (2006) Modeling Individual and Group Actions in Meetings with Layered HMMS. IEEE Transactions on Multimedia, 8, 509-520.

[29]   Glodek, M., Layher, G., Schwenker, F. and Palm, G. (2012) Recognizing Human Activities Using a Layered Markov Architecture. In: Villa, A., Duch, W., érdi, P., Masulli, F. and Palm, G., Eds., Artificial Neural Networks and Machine Learning—ICANN 2012, Springer, Berlin, 677-684.
http://dx.doi.org/10.1007/978-3-642-33269-2_85

[30]   Glodek, M., Schwenker, F. and Palm, G. (2012) Detecting Actions by Integrating Sequential Symbolic and Sub-Symbolic Information in Human Activity Recognition. In: Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition, Springer-Verlag, Berlin, Heidelberg, 394-404.
http://dx.doi.org/10.1007/978-3-642-31537-4_31

[31]   Brand, M., Oliver, N. and Pentland, A. (1997) Coupled Hidden Markov Models for Complex Action Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, 17-19 June 1997, 994-999.
http://dx.doi.org/10.1109/CVPR.1997.609450

[32]   Oliver, N., Rosario, B. and Pentland, A. (2000) A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 831-843.
http://dx.doi.org/10.1109/34.868684

[33]   Duong, T.V., Bui, H.H., Phung, D.Q. and Venkatesh, S. (2005) Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model. IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2005, 1, 838-845.
http://dx.doi.org/10.1109/CVPR.2005.61

[34]   Natarajan, P. and Nevatia, R. (2007) Coupled Hidden Semi Markov Models for Activity Recognition. Proceedings of the IEEE Workshop on Motion and Video Computing, Austin, 23-24 February 2007, 10.
http://dx.doi.org/10.1109/wmvc.2007.12

[35]   Shi, Q., Wang, L., Cheng, L. and Smola, A. (2008) Discriminative Human Action Segmentation and Recognition Using Semi-Markov Model. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 24-26 June 2008, 1-8.

[36]   Sung, J., Ponce, C., Selman, B. and Saxena, A. (2011) Human Activity Detection from RGBD Images. Technical Report, Carnegie Mellon University, Department of Computer Science, Cornell University, Ithaca, NY.

[37]   Guenterberg, E., Ghasemzadeh, H., Loseu, V. and Jafari, R. (2009) Distributed Continuous Action Recognition Using a Hidden Markov Model in Body Sensor Networks. In: Proceedings of the 5th IEEE International Conference on Distributed Computing in Sensor Systems, Springer-Verlag, Berlin, Heidelberg, 145-158.
http://dx.doi.org/10.1007/978-3-642-02085-8_11

[38]   Lowerre, B.T. (1976) The Harpy Speech Recognition System. PhD Thesis, Carnegie Mellon University, Pittsburgh.

[39]   Ryoo, M.S. and Aggarwal, J.K. (2006) Recognition of Composite Human Activities through Context-Free Grammar Based Representation. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, 17-22 June 2006, 1709-1718.

[40]   Savage, J. (1995) A Hybrid System with Symbolic AI and Statistical Methods for Speech Recognition. PhD Thesis, University of Washington, Seattle.

[41]   Gong, S. and Xiang, T. (2003) Recognition of Group Activities Using Dynamic Probabilistic Networks. Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, 13-16 October 2003, 742-749.

[42]   Nguyen-Duc-Thanh, N., Lee, S. and Kim, D. (2012) Two-Stage Hidden Markov Model in Gesture Recognition for Human Robot Interaction. International Journal of Advanced Robotic Systems, 9.

[43]   Oliver, N., Horvitz, E. and Garg, A. (2002) Layered Representations for Human Activity Recognition. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, IEEE Computer Society, Washington DC, 3-8.
http://dx.doi.org/10.1109/ICMI.2002.1166960

[44]   Lasseter, J. (1987) Principles of Traditional Animation Applied to 3D Computer Animation. ACM SIGGRAPH Computer Graphics, 21, 35-44.
http://dx.doi.org/10.1145/37402.37407

[45]   Williams, R. (2009) The Animator’s Survival Kit. Second Edition, Faber & Faber, London.

[46]   Wang, J., Liu, Z., Wu, Y. and Yuan, J. (2012) Mining Actionlet Ensemble for Action Recognition with Depth Cameras. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 16-21 June 2012, 1290-1297.
http://dx.doi.org/10.1109/cvpr.2012.6247813

[47]   Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A. and Blake, A. (2011) Real-Time Human Pose Recognition in Parts from Single Depth Images. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Washington DC, 1297-1304.
http://dx.doi.org/10.1109/cvpr.2011.5995316

[48]   Bribiesca, E. (2000) A Chain Code for Representing 3D Curves. Pattern Recognition, 33, 755-765.
http://dx.doi.org/10.1016/S0031-3203(99)00093-X

[49]   Bribiesca, E. (2008) A Method for Representing 3D Tree Objects Using Chain Coding. Journal of Visual Communication and Image Representation, 19, 184-198.
http://dx.doi.org/10.1016/j.jvcir.2008.01.001

 
 
Top