A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Mavaddati, Samira

doi:10.29252/jsdp.17.3.17

Signal and Data Processing Journal A scientific journal officially licensed by the Commission for Scientific Publications of the (MSRT). Publisher: Research Ceter for Developmen of Technologies

EN FA

Volume 17, Issue 3 (11-2020) JSDP 2020, 17(3): 17-36 | Back to browse issues page

‎ 10.29252/jsdp.17.3.17

Mendeley

Zotero

RefWorks

Mavaddati S. A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain. JSDP 2020; 17 (3) :17-36
URL: http://jsdp.rcisp.ac.ir/article-1-835-en.html

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Samira Mavaddati ^*

University of Mazandaran

Abstract: (4924 Views)

Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech and noise models for each subband of wavelet decomposition level based on the coherence criterion. Using the presented learning method, the self-coherence measure between different atoms of each dictionary and mutual coherence between the atoms of speech and noise dictionaries are minimized and lower sparse reconstruction error is yielded. In order to reduce the computation time, a composite dictionary is utilized including only the speech dictionary and one of the noise dictionaries selected corresponding to the noise condition in the test environment. The speech enhancement algorithm is introduced in two scenarios, supervised and semi-supervised situations. In each scenario, a voice activity detector (VAD) scheme is employed based on the energy of sparse coefficient matrices when the observed data is coded over the related dictionary.
The presented VAD algorithms are based on the energy of the coefficient matrices in the sparse representation of the observation data over the specified dictionaries. These speech enhancement schemes are different in the mentioned scenarios. In the proposed supervised scenario, domain adaptation technique is employed to transform a learned noise dictionary into an adapted dictionary according to the noise conditions of the test environment. Using this step, the observed data is sparsely coded with low sparse approximation error based on the current situation of the noisy environment. This technique has a prominent role to obtain better enhancement results particularly when the noise signal has non-stationary characteristics. In the proposed semi-supervised scenario, adaptive thresholding of wavelet coefficients is carried out based on the variance of the estimated noise for each frame in different subbands. These implementations are carried out in two different conditions, the training and test steps, as speaker dependent and speaker independent scenarios.
Also, different measures are applied to evaluate the performance of the presented enhancement procedures. Moreover, a statistical test is used to have a more precise performance evaluation for different considered methods in the various noisy conditions. The experimental results using different measures show that the presented supervised enhancement scheme leads to much better results in comparison with the baseline enhancement methods, learning-based approaches, and earlier wavelet-based algorithms. These results have been obtained for an extensive range of noise types including the structured, unstructured, and periodic noise signals in different SNR values.

Keywords: Speech enhancement, Dictionary learning, Sparse representation, Domain adaptation, Voice activity detector, Wavelet transform

Full-Text [PDF 5690 kb] (1036 Downloads)

Type of Study: Research | Subject: Paper
Received: 2018/02/7 | Accepted: 2019/06/19 | Published: 2020/12/5 | ePublished: 2020/12/5

References

1. [1] M. Klein, P. Kabal, "Signal subspace speech enhancement with perceptual post-filtering", Proc. IEEE Internat. Conf. Acoust. Speech Signal Process. (ICASSP), Vol. 1, pp. 537-540, 2002. [DOI:10.1109/ICASSP.2002.5743773] [PMID]

2. [2] S. Kamath, P. Loizou, "A multi-band spectral subtraction method for enhancing speech corrupted by colored noise", In: Proc. IEEE Internat. Conf. Acoust. Speech Signal Process. (ICASSP), Orlando, Florida, 2002. [DOI:10.1109/ICASSP.2002.5745591]

3. [3] D.L. Donoho, "De-noising by soft-thresholding", IEEE Trans. Inf. Theory, Vol. 41, No. 3, pp. 613-627, 1995. [DOI:10.1109/18.382009]

4. [4] N. Upadhyay, R.K. Jaiswal, "Single Channel Speech Enhancement: Using Wiener Filtering with Recursive Noise Estimation", Procedia Computer Science, Vol. 84, pp. 22-30, 2016. [DOI:10.1016/j.procs.2016.04.061]

5. [5] J. Candes, M.B. Wakin, "An introduction to compressive sampling", IEEE Signal Processing Magazine, pp. 21-30, 2008. [DOI:10.1109/MSP.2007.914731]

6. [6] R.G. Baraniuk, "Compressive Sensing", IEEE Signal Processing Magazine, pp. 118-121, 2007. [DOI:10.1109/MSP.2007.4286571]

7. [7] S. Ayat, R. Dianat, M. Manzuri, "Wavelet Based Speech Enhancement Using a New Thresholding Algorithm", IEEE Intl. Symposium on Intelligent Multimedia, Video & Speech Processing (ISIMP), Hong Kong, 2004.

8. [8] C.T. Lu, H.C. Wang, "Speech enhancement using wavelet transform with constrained thresholds", In Proc. The 3rd International Symposium on Chinese Spoken Language Processing (ISCSLP), Taipei, Taiwan, pp. 185-188, 2002.

9. [9] E. Ambikairajah, G. Tattersall, A. Davis, "Wavelet Transform-based Speech Enhancement", Proc. on ICSLP, Vol. 3, 1998.

10. [10] V.S.R. Kumari, D.K. Devarakonda, "A Wavelet Based Denoising of Speech Signal", International Journal of Engineering Trends and Technology (IJETT), Vol. 5, No. 2, pp. 107-115, 2013.

11. [11] K. Khaldi, A.O. Boudraa, A. Komaty, "Speech enhancement using empirical mode decomposition and the Teager-Kaiser energy operator", J Acoust Soc Am, Vol. 13, No. 5, pp. 451-459, 2014. [DOI:10.1121/1.4837835] [PMID]

12. [12] T.F. Sanam, C. Shahnaz, "A semisoft thresholding method based on Teager energy operation on wavelet packet coefficients for enhancing noisy speech", EURASIP Journal on Audio, Speech and Music Processing, Springer, 2013. [DOI:10.1186/1687-4722-2013-25]

13. [13] S. Hongo, S. Sakamoto, Y. Suzuki, "Binaural speech enhancement method by wavelet transform based on interaural level and argument differences", International Conference on Wavelet Analysis and Pattern Recognition, Xian, 2012, pp. 290-295. [DOI:10.1109/ICWAPR.2012.6294795]

14. [14] T. V. Pham, "Wavelet analysis for robust speech processing and applications", PHD Thesis, Graz University of Technology, 2007.

15. [15] I. Pinter, "Perceptual wavelet-representation of speech signals and its application to speech enhancement", Computer Speech and Language, Vol. 10, No. 1, pp. 1-22, 1996. [DOI:10.1006/csla.1996.0001]

16. [16] M.A. Messaoud, A. Bouzid, "Speech enhancement based on wavelet transform and improved subspace decomposition", Journal of Audio Engineering society (JAES), Vol. 63, No.12, pp.1-11, 2015. [DOI:10.17743/jaes.2015.0083]

17. [17] C.L. Wu, H.P. Hsu, S.S. Wang, J.W. Hung, Y.H. Lai, H.M. Wang, Y. Tsao, "Wavelet speech enhancement based on robust principal component analysis", Proc. Interspeech , 781, pp. 439-443, 2017.

18. [18] T.Y. Zuo, L. He, W.D. Sheng, "A new algorithm of the wavelet packet speech denoising based on masking perception model", 7th International conference on natural computation (ICNC), Vol. 1, 2011, pp. 33-37.

19. [19] H. Zhao, X. Peng, L. Hu, G. Wang, "An improved speech enhancement method based on teager energy operator and perceptual wavelet packet decomposition", Journal of Multimedia, Vol. 6, No. 3, 2011. [DOI:10.4304/jmm.6.3.308-315]

20. [20] R. Gomez, T. Kawahara, "Optimized wavelet-based speech enhancement for speech recognition in noisy and reverberant conditions", APSIPA ASC, 2011.

21. [21] T.F. Sanam, C. Shahnaz, "Teager energy operation on wavelet packet coefficients for enhancing noisy speech using a hard thresholding function", Published in Signal Processing: An International Journal (SPIJ), Vol. 6, pp. 22-43, 2011.

22. [22] G. Chen, C. Xiong, J.J Corso, "Dictionary transfer for image denoising via domain adaptation", In Proceedings of IEEE International Conference on Image Processing, 2012. [DOI:10.1109/ICIP.2012.6467078]

23. [23] S. Mavaddati, S.M. Ahadi Sarkani, S. Seyedin, "A novel speech enhancement method by learnable sparse and low-rank decomposition and domain adaptation", Speech Communication, Vol. 76, pp. 42-60, 2016. [DOI:10.1016/j.specom.2015.11.003]

24. [24] A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, R. Tandon, JMLR: Workshop and Conference Proceedings, Vol. 35, 2014, pp. 1-15.

25. [25] H. Lee, A. Battle, R. Raina, A.Y. Ng, "Efficient sparse coding algorithms", Advances in Neural Information Processing Systems, 2006.

26. [26] J. Portilla, L. Mancera, "L0-based sparse approximation: Two alternative methods and some applications", Proceedings of the 16th IEEE international conference on Image processing, 2009, pp. 3865-3868.

27. [27] S.Mavaddati, M. Ahadi, "Speech Enhancement using Adaptive Data-Based Dictionary Learning", JSDP, vol. 17 (1), pp. 99-116, 2020. [DOI:10.29252/jsdp.17.1.99]

28. [28] R. Mozaffari, S. Mavaddati, "A Novel Image Denoising Method Based on Incoherent Dictionary Learning and Domain Adaptation Technique", JSDP, vol. 16 (4), pp.73-92. 2020. [DOI:10.29252/jsdp.16.4.73]

29. [29] C.D. Sigg, T. Dikk, J.M. Buhmann, "Speech enhancement using generative dictionary learning", IEEE Transactions on Audio, Speech and Language Processing, Vol. 20, No.6, pp.1698-1712, 2012. [DOI:10.1109/TASL.2012.2187194]

30. [30] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, "Least angle regression", Ann. Stat., Vol. 32, pp. 407-499, 2004. [DOI:10.1214/009053604000000067]

31. [31] M. Aharon, M. Elad, A. Bruckstein, "K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation", IEEE Trans. Signal Process, Vol. 54, No. 11, pp. 4311-4322, 2006. [DOI:10.1109/TSP.2006.881199]

32. [32] X. Wu, D. Yu, "Atomic Decomposition Method Based on Adaptive chirplet Dictionary", Advances in Adaptive Data Analysis, Vol. 4, pp. 1-19, 2012.

33. [33] J. Tropp, I. Dhillon, R.J. Heath, T. Strohmer, "Designing structural tight frames via an alternating projection method", IEEE Trans. on Information Theory, Vol. 51, No.1, pp. 188-209, 2005. [DOI:10.1109/TIT.2004.839492]

34. [34] M. Sustik, J. Tropp, I. Dhillon, R. Heath, "On the existence of equiangular tight frames", Linear Algebra and Its Applications, Vol. 426, No. 2, pp. 619-635, 2007. [DOI:10.1016/j.laa.2007.05.043]

35. [35] D. Liu, J. Nocedal, "On the limited memory BFGS method for large scale optimization", Math. Program, Vol. 45, pp. 503-528, 1989. [DOI:10.1007/BF01589116]

36. [36] S. Mavaddati, S.M. Ahadi Sarkani, S. Seyedin, "Speech enhancement using sparse dictionary learning in wavelet packet transform domain", Computer Speech and Language, Vol. 44, pp. 22-47, 2017. [DOI:10.1016/j.csl.2017.01.009]

37. [37] D.L. Donoho, "De-noising by soft-thresholding", IEEE Trans. Inf. Theory, Vol. 4, No. 3, pp. 613-627, 1995. [DOI:10.1109/18.382009]

38. [38] http://www.dcs.shef.ac.uk/spandh/gridcorpus.

39. [39] A. Varga, H. Steeneken, J.M. Tomlinson, D. Jones, "The Noisex-92 study on the effect of additive noise on automatic speech recognition", Technical Report. Malvern, U.K.: DRA Speech Res. Unit, 1992.

40. [40] H.G. Hirsch, D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions", Proc. ISCA ITRWASR, pp.181-188, 2000.

41. [41] http://pianosociety.com.

42. [42] Y. Lu, P.C. Loizou, "A geometric approach to spectral subtraction", Speech communication, Vol. 50, No. 6, pp. 453-466, 2008. [DOI:10.1016/j.specom.2008.01.003] [PMID] [PMCID]

43. [43] Y. Ghanbari, M.R. Karami Mollaei, "A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets", Speech communication, Vol. 48, No. 40, pp. 927-940, 2006. [DOI:10.1016/j.specom.2005.12.002]

44. [44] S. Mavaddati, S.M. Ahadi Sarkani, S. Seyedin, "Modified coherence-based dictionary learning method for speech enhancement", Signal Processing, IET, Vol. 9, No. 7, pp. 1-9, 2015. [DOI:10.1049/iet-spr.2014.0148]

45. [45] J. Benesty, Springer handbook of speech processing, Springer's publication, pp. 843-871, 2008. [DOI:10.1007/978-3-540-49127-9]

46. [46] J. Demsar, "Statistical comparisons of classifiers over multiple data set", The Journal of Machine Learning Research, Vol. 7, pp. 1-30, 2006.

47. [47] D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 4nd ed. Boca Raton, FL: Chapman & Hall/CRC, 2000.

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.