JCC  Vol.2 No.9 , July 2014
Speech Signal Recovery Based on Source Separation and Noise Suppression
Abstract

In this paper, a speech signal recovery algorithm is presented for a personalized voice command automatic recognition system in vehicle and restaurant environments. This novel algorithm is able to separate a mixed speech source from multiple speakers, detect presence/absence of speakers by tracking the higher magnitude portion of speech power spectrum and adaptively suppress noises. An automatic speech recognition (ASR) process to deal with the multi-speaker task is designed and implemented. Evaluation tests have been carried out by using the speech da- tabase NOIZEUS and the experimental results show that the proposed algorithm achieves impressive performance improvements.


Cite this paper
Wang, Z. , Zhang, H. and Bi, G. (2014) Speech Signal Recovery Based on Source Separation and Noise Suppression. Journal of Computer and Communications, 2, 112-120. doi: 10.4236/jcc.2014.29015.
References

[1]   Boll, S. (197) Suppression of Acoustic Noise In Speech Using Spectral Subtraction. IEEE Transactions on Acoustics Speech and Signal Processing, 27, 113-120. http://dx.doi.org/10.1109/TASSP.1979.1163209

[2]   Junqua, J.C., Mak, B. and Reaves, B. (1994) A Robust Algorithm forward Boundary Detection in the Presence of Noise. IEEE Transactions on Speech and Audio Processing, 2, 406-421. http://dx.doi.org/10.1109/89.294354

[3]   Beritelli, F., Casale, S., Ruggeri, G., et al. (2002) Performances Evaluation and Comparison of G.729/AMR/Fuzzy Voice Activity Detectors. IEEE Signal Processing Letters, 9, 85-88. http://dx.doi.org/10.1109/97.995824

[4]   Abdallah, I., Montresor, S. and Baudry, M. (1997) Robust Speech/Non-Speech Detection in Adverse Conditions Using an Entropy Based Estimator. International Conference on Digital Signal Processing, Santorini, 757-760.

[5]   Zhang, H., Bi, G., Razul, S.G. and See, C.-M. (2013) Estimation of Underdetermined Mixing Matrix with Unknown Number of Overlapped Sources in Short-Time Fourier Transform Domain. IEEE ICASSP, 6486-6490.

[6]   Comaniciu, D. and Meer, P. (2002) Mean Shift: A Robust Approach toward Feature Space Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 603-619. http://dx.doi.org/10.1109/34.1000236

[7]   Aissa-El-Bey, A., Linh-Trung, N., Abed-Meraim, K. and Grenier, Y. (2007) Underdetermined Blind Separation of Nondisjoint Sources in the Time-Frequency Domain. IEEE Transactions on Signal Processing, 55, 897-907. http://dx.doi.org/10.1109/TSP.2006.888877

[8]   Griffin, D. and Lim, J.S. (1984) Signal Estimation from Modified Short-Time Fourier Transform. IEEE Transactions on Acoustics Speech and Signal Processing, 32, 236-243. http://dx.doi.org/10.1109/TASSP.1984.1164317

[9]   Chang, H.Y., Lee, A.K. and Li, H.Z. (2009) An GMM Super-vector Kernel with Bhattacharyya Distance for SVM Based Speaker Recognition. IEEE ICASSP, 4221-4224.

[10]   Hu, Y. and Loizou, P. (2006) Subjective Comparison of Speech Enhancement Algorithms. IEEE ICASSP, 1, 153-156.

 
 
Top