ENG  Vol.5 No.10 B , October 2013
Emotional Speech Synthesis Based on Prosodic Feature Modification

The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.

Cite this paper: He, L. , Huang, H. and Lech, M. (2013) Emotional Speech Synthesis Based on Prosodic Feature Modification. Engineering, 5, 73-77. doi: 10.4236/eng.2013.510B015.

[1]   R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz and J. G. Taylor, “Emotion Recognition in Human-Computer Interaction,” Signal Processing Magazine, IEEE, Vol. 18, No. 1, 2001, pp. 32-80.

[2]   M. Schröder, R. Cowie and E. Cowie, “Emotional Speech Synthesis: A Review,” Eurospeech-2001, 2001.

[3]   J. E. Cahn, “The Generation of Affect in Synthesized Speech,” Journal of the American Voice I/O Society, Vol. 9, 1990, pp. 1-19.

[4]   F. Burkhardt and F. Sendlmeier, “Verification of Acoustical Correlates of Emotional Speech Using Formant-Synthesis,” ISCA Workshop on Speech & Emotion, Northern Ireland, 2000, pp. 151-156.

[5]   M. Bulut, S. Narayan and A. Syrdal, “Expressive Speech Synthesis Using a Concatenative Synthesizer,” Proceedings of ICSLP, 2002, pp. 1265-1268.

[6]   E. Eide, “Preservation, Identification, and Use of Emotion in a Textto-Speech System,” Proceedings of IEEE Workshop on Speech Synthesis, 2002, pp. 127-130.

[7]   A. W. Black and N. Cambpbell, “Optimising Selection of Units from Speech Database for Concatenative Synthesis,” Proceedings of EUROSPEECH-95, 1995, pp. 581-584.

[8]   J. Pitrelli, R. Bakis, E. Eide, R. Fernandez, W. Hamza and M. Picheny, “The IBM Expressive Text-to-Speech Synthesis System for American English,” IEEE Transactions on Speech Audio Process, Vol. 14, No. 4, 2006, pp. 1099- 1108.

[9]   W. Hamza, R. Bakis, E. Eide, M. Picheny and J. Pitrelli, “The IBM Expressive Speech Synthesis System,” Proceedings of ICSLP, 2004.

[10]   G. Hofer, K. Richmond and R. Clark, “Informed Blending of Databases for Emotional Speech Synthesis,” Proceedings of Interspeech, 2005, pp. 501-504.

[11]   M. Schroder, “Speech and Emotion Research: An Overview of Research Frameworks and a Dimensional Approach to Emotional Speech Synthesis,” Ph.D. Thesis, Saarland University, Saarland, 2004.

[12]   L. R. Rabiner and R. W. Schafer, “Digital Processing of Speech Signals,” Prentice-Hall, Inc., Englewood Cliffs, 1978.

[13]   F. Burkhardt, A. Paeschke, M. Rolfes, et al., “A Database of German Emotional Speech,” Proceedings of Interspeech, 2005.