Volume 17, Issue 1 (6-2020)                   JSDP 2020, 17(1): 99-116 | Back to browse issues page

XML Persian Abstract Print

Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Mavaddati S, Ahadi M. Speech Enhancement using Adaptive Data-Based Dictionary Learning. JSDP. 2020; 17 (1) :99-116
URL: http://jsdp.rcisp.ac.ir/article-1-805-en.html
University of Mazandaran
Abstract:   (1278 Views)
In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques to attenuate the background noise without causing any distortion in the speech signal. In this paper, we focus on the single channel speech enhancement corrupted by the additive Gaussian noise. In recent years, there has been an increasing interest in employing sparse representation techniques for speech enhancement. Sparse representation technique makes it possible to show the major information about the speech signal based on a smaller dimension of the original spatial bases. The capability of a sparse decomposition method depends on the learned dictionary and matching between the dictionary atoms and the signal features. An over complete dictionary is yielded based on two main steps: dictionary learning process and sparse coding technique. In dictionary selection step, a pre-defined dictionary such as the Fourier basis, wavelet basis or discrete cosine basis is employed. Also, a redundant dictionary can be constructed after a learning process that is often based on the alternating optimization strategies. In sparse coding step, the dictionary is fixed and a sparse coefficient matrix with the low approximation error has been earned. The goal of this paper is to investigate the role of data-based dictionary learning technique in the speech enhancement process in the presence of white Gaussian noise. The dictionary learning method in this paper is based on the greedy adaptive algorithm as a data-based technique for dictionary learning. The dictionary atoms are learned using the proposed algorithm according to the data frames taken from the speech signals, so the atoms contain the structure of the input frames. The atoms in this approach are learned directly from the training data using the norm-based sparsity measure to earn more matching between the data frames and the dictionary atoms. The proposed sparsity measure in this paper is based on Gini parameter. We present a new sparsity index using Gini coefficients in the greedy adaptive dictionary learning algorithm. These coefficients are set to find the atoms with more sparsity in the comparison with the other sparsity indices defined based on the norm of speech frames. The proposed learning method iteratively extracts the speech frames with minimum sparsity index according to the mentioned measures and adds the extracted atoms to the dictionary matrix. Also, the range of the sparsity parameter is selected based on the initial silent frames of speech signal in order to make a desired dictionary. It means that a speech frame of input data matrix can add to the first columns of the over complete dictionary when it has not a similar structure with the noise frames. The data-based dictionary learning process makes the algorithm faster than the other dictionary learning methods for example K-singular value decomposition (K-SVD), method of optimal directions (MOD) and other optimization-based strategies. The sparsity of an input frame is measured using Gini-based index that includes smaller measured values for speech frames because of their sparse content. On the other hand, high values of this parameter can be yielded for a frame involved the Gaussian noise structure. The performance of the proposed method is evaluated using different measures such as improvement in signal-to-noise ratio (ISNR), the time-frequency representation of atoms and PESQ scores. The proposed approach results in a significant reduction of the background noise in comparison with other dictionary learning methods such as principal component analysis (PCA) and the norm-based learning method that are traditional procedures in this context. We have found good results about the reconstruction error in the signal approximations for the proposed speech enhancement method. Also, the proposed approach leads to the proper computation time that is a prominent factor in dictionary learning methods. 
Full-Text [PDF 6864 kb]   (259 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2017/11/6 | Accepted: 2020/01/22 | Published: 2020/06/21 | ePublished: 2020/06/21

1. [1] H. Guo, B. Zhao, G. Zhou, "Image compression based on compressed sensing theory and wavelet packet analysis", Cross Strait Quad-Regional Radio Science and Wireless Technology Conference, vol. 2, pp. 1426-1429, 2011.
2. [2] J. L. Starck, M. Elad, D. L. Donoho, "Image decomposition via the combination of sparse representation and a variational approach", IEEE Trans. on Image Proces, vol. 14, No. 10, pp. 1570-1582 , 2005. [DOI:10.1109/TIP.2005.852206] [PMID]
3. [3] F. Rodriguez, G. Sapiro, "Sparse representation for image classification: Learning discriminative and reconstructive non-parametric dictionaries", IMA Preprint, 2007. [DOI:10.21236/ADA513220]
4. [4] S. Zhang, J. Huang, D. Metaxas, W. Wang, X. Huang, "Discriminative sparse representations for cervigram image segmentation", IEEE Inter-national Symposium on Biomedical Imaging: From Nano to Macro, pp.133-136, 2010. [DOI:10.1109/ISBI.2010.5490397]
5. [5] M. S. Lewiki, T. J. Sejnowski, "Learning overcomplete representations", Neural Computing, vol. 12, No. 2, pp. 337-365, 2000. [DOI:10.1162/089976600300015826] [PMID]
6. [6] D. Giacobello, M. G. Christensen, M. N. Murthi, S. H. Jensen, M. Moonen, "Retrieving sparse patterns using a compressed sensing framework: applications to speech coding based on sparse linear prediction", IEEE Signal Processing Letters, vol. 17, No. 1, pp.103-106, 2010. [DOI:10.1109/LSP.2009.2034560]
7. [7] D. Wu, Z. Ping, M. N. S. Swamy, "A compressive sensing method for noise reduction of speech and audio signals", IEEE 54th International Midwest Symposium on Circuits and Systems, pp. 1-4, 2011. [DOI:10.1109/MWSCAS.2011.6026662]
8. [8] D. Wu, Z.W. Ping, M. N. S. Swamy, "On sparsity issues in compressive sensing based speech enhancement", IEEE International Symposium on Circuits and Systems, pp. 285-288, 2012. [DOI:10.1109/ISCAS.2012.6271907]
9. [9] E. Candès, J. Romberg, T. Tao, "Robust uncertainty principles: exact signal recons-truction from highly incomplete frequency information", IEEE Trans. Inform. Theory, vol. 52, No. 2, pp. 489-509, 2006. [DOI:10.1109/TIT.2005.862083]
10. [10] E. J. Candes, T. Tao, "Near-optimal signal recovery from random projections and universal encoding strategies", IEEE Trans. Inform. Theory, vol. 52, No. 12, pp. 5406-5425, 2006. [DOI:10.1109/TIT.2006.885507]
11. [11] H. Xu, "Speech enhancement based on compressed sensing technology", Sensors & Transducers Journal, vol. 181, pp. 141-145, 2014.
12. [12] G. S. Sivaram, S. K. Nemala, M. Elhilali, H. Hermansky, "Sparse coding for speech recognition", IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4346-4349, 2010. [DOI:10.1109/ICASSP.2010.5495649]
13. [13] D. D. Lee, H. S. Seung, "Learning the parts of objects by non-negative matrix factorization", Nature 401, 788- 791, 1999. [DOI:10.1038/44565] [PMID]
14. [14] N. Mohammadiha, P. Smaragdis, A. Leijon, "Supervised and unsupervised speech enhan-cement using nonnegative matrix factor-ization", IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, No. 10, pp.2140-2151, 2013. [DOI:10.1109/TASL.2013.2270369]
15. [15] J. T. Geiger, J. F. Gemmeke, B. Schuller, and G. Rigoll, "Investigating NMF speech enhancement for neural network based acoustic models", Proceedings of the Annual Con-ference of International Speech Communi-cation Association (INTERSPEECH), 2014.
16. [16] P. O. Hoyer, "Non-negative matrix factorization with sparseness constraints", J. Mach. Learn. Res., vol. 5, pp. 1457-1469, 2004.
17. [17] C. D. Sigg, T. Dikk, J. M. Buhmann, "Speech enhancement with sparse coding in learned dictionaries", In Acoustics Speech and Signal Processing (ICASSP), IEEE International Conference on, pp. 4758-4761, 2010. [DOI:10.1109/ICASSP.2010.5495157]
18. [18] C. D. Sigg, T. Dikk, J. M. Buhmann, "Speech enhancement using generative dictionary learning", IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, No. 6, pp. 1698-1712, 2012. [DOI:10.1109/TASL.2012.2187194]
19. [19] M. Aharon, M. Elad, A. Bruckstein, "K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation", IEEE Trans. Signal Process, vol. 54, No. 11, pp. 4311-4322, 2006. [DOI:10.1109/TSP.2006.881199]
20. [20] H. Yongjun, J. Han, S. Deng, T. Zheng, G. Zheng, "A solution to residual noise in speech denoising with sparse representation", In Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference on, pp. 4653-4656, 2012.
21. [21] S. Mavaddaty, S. M. Ahadi, S. Seyedin, "A novel speech enhancement method by learnable sparse and low-rank decomposition and domain adaptation", Speech Communi-cation, vol. 76, pp. 42-60, 2016. [DOI:10.1016/j.specom.2015.11.003]
22. [22] M. G. Jafari, M. D. Plumbley, "Speech denoising based on a greedy adaptive dic-tionary algorithm", 17th European Signal Processing Conference (EUSIPCO), pp. 1423-1426, 2009.
23. [23] M. G. Jafari, M. D. Plumbley, "Fast dictionary learning for sparse representations of speech signals", IEEE Journal of selected topics in signal processing, vol. 5, pp. 1025-1031, 2011. [DOI:10.1109/JSTSP.2011.2157892]
24. [24] R. Rubinstein, M. Zibulevsky, M. Elad, "Double sparsity: Learning sparse dictionaries for sparse signal approximation", IEEE Trans. on Signal Processing, Vol. 58, pp. 1553-1564, 2010. [DOI:10.1109/TSP.2009.2036477]
25. [25] M. G. Jafari, E. Vincent, S. A. Abdallah, M. D. Plumbley, M. E. Davies, "An adaptive stereo basis method for convolutive blind audio source separation", Neuro computing, vol. 71, pp. 2087-2097, 2008. [DOI:10.1016/j.neucom.2007.08.029]
26. [26] http://www.dcs.shef.ac.uk/spandh/gridcorpus.
27. [27] N. Hurley, S. Rickard, "Comparing measures of sparsity", IEEE Trans. on Information Theory, vol. 55, pp. 4723-4741, 2009. [DOI:10.1109/TIT.2009.2027527]
28. [28] S. Rickard, M. Fallon, "The gini index of speech", In Proc. Of Information Sciences and Systems conference, Princeton, NJ, 2004.
29. [29] P. J. Wolfe, "Sparse time-frequency repre-sentations in audio processing, as studied through a symmetrized lognormal model," In Proc. of the European Signal Processing Con-ference (EUSIPCO), pp. 355-359, 2007.
30. [30] H. Dalton, "The measurement of the inequity of incomes", Economic Journal, Vol. 30, pp. 348-361, 1920. [DOI:10.2307/2223525]
31. [31] I. Cohen, B. Berdugo, "Speech enhancement for non-stationary noise environments", Signal processing, vol. 81, No. 11, pp. 2403-2418, 2001. [DOI:10.1016/S0165-1684(01)00128-1]
32. [32] A. Rix, J. Beerends, M. Hollier, A. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs", In Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., pp. 749-752, 2001.

Add your comments about this article : Your username or Email:

Send email to the article author

© 2015 All Rights Reserved | Signal and Data Processing