Back
 JBiSE  Vol.9 No.10 B , September 2016
Feature Optimization of Speech Emotion Recognition
Abstract:
Speech emotion is divided into four categories, Fear, Happy, Neutral and Surprise in this paper. Traditional features and their statistics are generally applied to recognize speech emotion. In order to quantify each feature’s contribution to emotion recogni-tion, a method based on the Back Propagation (BP) neural network is adopted. Then we can obtain the optimal subset of the features. What’s more, two new characteristics of speech emotion, MFCC feature extracted from the fundamental frequency curve (MFCCF0) and amplitude perturbation parameters extracted from the short- time av-erage magnitude curve (APSAM), are added to the selected features. With the Gaus-sian Mixture Model (GMM), we get the highest average recognition rate of the four emotions 82.25%, and the recognition rate of Neutral 90%.
Cite this paper: Yu, C. , Xie, L. and Hu, W. (2016) Feature Optimization of Speech Emotion Recognition. Journal of Biomedical Science and Engineering, 9, 37-43. doi: 10.4236/jbise.2016.910B005.
References

[1]   [1] Ayadi, M.E., Kamel, M.S. and Karray, F. (2011) Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases. Pattern Recog., 44, 572-587. http://dx.doi.org/10.1016/j.patcog.2010.09.020

[2]   Sultana, S., Shahnaz, C., Fattah, S.A., et al. (2014) Speech Emotion Recognition Based on Entropy of Enhanced Wavelet Coefficients. 2014 IEEE International Symposium on Circuits and Systems (ISCAS), Melbourne VIC, 1-5 June 2014, 137-140. http://dx.doi.org/10.1109/ISCAS.2014.6865084

[3]   Zheng, W.Q., Yu, J.S. and Zou, Y.X. (2015) An Experimental Study of Speech Emotion Recognition Based on Deep Convolutional Neural Networks. 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), Xi'an, 21-24 September 2015, 827-831. http://dx.doi.org/10.1109/ACII.2015.7344669

[4]   Wang, K.X., An, N., Li, B.N., Zhang, Y.Y. and Li, L. (2015) Speech Emotion Recognition Using Fourier Parameters. IEEE Trans-actions on Affective Computing, 6, 69-75. http://dx.doi.org/10.1109/TAFFC.2015.2392101

[5]   Bouwmans, T., Baf, F.E. and Vachon, B. (2008) Background Modeling Using Mixture of Gaussians for Foreground Detection. Re-cent Patents on Computer Science, 219-237. http://dx.doi.org/10.2174/2213275910801030219

[6]   Ruck, D.W., Rogers, S.K. and Ka-brisky, M. (1990) Feature Selection Using a Multilayer Perceptron. Journal of Neural Network Computing, 2, 40-48.

[7]   Devijver, P.A. and Kittler, J. (1982) Pattern Recognition: A Statistical Approach. Prentice Hall International, London.

[8]   Vlasenko, B., Schuller, B., Wendemuth, A. and Rigoll, G. (2007) Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing. Proceedings of the 2nd International Conference on Affective Computing and Intelligent Interaction (ACII’07), 139-147.

[9]   Yuan, Y.J., Zhao, P.H. and Zhou, Q. (2010) Research of Speaker Recognition Based on Combination of LPCC and MFCC. Proc. IEEE Int. Conf. Intell. Comput. Intell. Syst., 3, 765-767. http://dx.doi.org/10.1109/icicisys.2010.5658337

[10]   Li, Y. and Zhao, Y.X. (1998) Recognizing Emotions in Speech Using Short-Term and Long- Tern Features. Porc. ICSLP, Sydney, Australian, 2255-2258.

[11]   Dellaert, F., Polzin, T. and Waibel, A. (1996) Recognizing Emotion in Speech. Proceedings of ICSLP, 1970-1973. http://dx.doi.org/10.1109/icslp.1996.608022

 
 
Top