بهسازی گفتار به‌کمک یادگیری واژه‌نامه مبتنی‌بر داده

مودتی, سمیرا; احدی, محمد

doi:10.29252/jsdp.17.1.99

دوره 17، شماره 1 - ( 4-1399 ) جلد 17 شماره 1 صفحات 116-99 | برگشت به فهرست نسخه ها

‎ 10.29252/jsdp.17.1.99

بهسازی گفتار به‌کمک یادگیری واژه‌نامه مبتنی‌بر داده

سمیرا مودتی^*

، محمد احدی

دانشگاه مازندران

چکیده: (2903 مشاهده)

بهسازی گفتار یکی از پرکاربردترین حوزه‌ها در زمینه پردازش گفتار است. در این مقاله، یکی از روش‌های بهسازی گفتار مبتنی‌بر اصول بازنمایی تُنُک بررسی می‌شود. بازنمایی تُنُک این امکان را فراهم می‌سازد که عمده اطلاعات لازم برای بازنمایی سیگنال‌، براساس بُعد بسیار کمتری از پایه‌های فضایی اصلی قابل مدل‌سازی باشد. روش‌ یادگیری در این مقاله براساس تصحیح الگوریتم تطبیقی حریصانه مبتنی‌بر داده خواهد بود که واژه‌نامه در آن، به‌طور مستقیم از روی سیگنال داده و براساس شاخص تُنُکی مبتنی‌بر نُرم به منظور تطابق بیشتر میان اتم‌ها و ساختار داده آموزش می‌بیند. در این مقاله شاخص تُنُکی جدیدی براساس معیار جینی پیشنهاد می‌شود. همچنین محدوده پارامتر تُنُکی بخش‌های نوفه‌ای با توجه به فریم‌های ابتدایی گفتار تعیین و طی یک روال پیشنهادی در تشکیل واژه‌نامه مورد استفاده قرار می‌گیرد. نتایج بهسازی نشان می‌دهد که عملکرد روش پیشنهادی در انتخاب قاب‌‌های داده براساس معیار معرفی‌شده در شرایط نوفه‌ای مختلف بهتر از شاخص تُنُکی مبتنی‌بر نُرم و سایر الگوریتم‌های پایه در این راستا است.

واژه‌های کلیدی: بهسازی گفتار، بازنمایی تُنُک، یادگیری واژه‌نامه، مبتنی‌بر داده، تطبیقی حریصانه، شاخص تُنُکی جینی

متن کامل [PDF 6864 kb] (646 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: مقالات پردازش گفتار
دریافت: 1396/8/15 | پذیرش: 1398/11/2 | انتشار: 1399/4/1 | انتشار الکترونیک: 1399/4/1

فهرست منابع

1. [1] H. Guo, B. Zhao, G. Zhou, "Image compression based on compressed sensing theory and wavelet packet analysis", Cross Strait Quad-Regional Radio Science and Wireless Technology Conference, vol. 2, pp. 1426-1429, 2011.

2. [2] J. L. Starck, M. Elad, D. L. Donoho, "Image decomposition via the combination of sparse representation and a variational approach", IEEE Trans. on Image Proces, vol. 14, No. 10, pp. 1570-1582 , 2005. [DOI:10.1109/TIP.2005.852206] [PMID]

3. [3] F. Rodriguez, G. Sapiro, "Sparse representation for image classification: Learning discriminative and reconstructive non-parametric dictionaries", IMA Preprint, 2007. [DOI:10.21236/ADA513220]

4. [4] S. Zhang, J. Huang, D. Metaxas, W. Wang, X. Huang, "Discriminative sparse representations for cervigram image segmentation", IEEE Inter-national Symposium on Biomedical Imaging: From Nano to Macro, pp.133-136, 2010. [DOI:10.1109/ISBI.2010.5490397]

5. [5] M. S. Lewiki, T. J. Sejnowski, "Learning overcomplete representations", Neural Computing, vol. 12, No. 2, pp. 337-365, 2000. [DOI:10.1162/089976600300015826] [PMID]

6. [6] D. Giacobello, M. G. Christensen, M. N. Murthi, S. H. Jensen, M. Moonen, "Retrieving sparse patterns using a compressed sensing framework: applications to speech coding based on sparse linear prediction", IEEE Signal Processing Letters, vol. 17, No. 1, pp.103-106, 2010. [DOI:10.1109/LSP.2009.2034560]

7. [7] D. Wu, Z. Ping, M. N. S. Swamy, "A compressive sensing method for noise reduction of speech and audio signals", IEEE 54th International Midwest Symposium on Circuits and Systems, pp. 1-4, 2011. [DOI:10.1109/MWSCAS.2011.6026662]

8. [8] D. Wu, Z.W. Ping, M. N. S. Swamy, "On sparsity issues in compressive sensing based speech enhancement", IEEE International Symposium on Circuits and Systems, pp. 285-288, 2012. [DOI:10.1109/ISCAS.2012.6271907]

9. [9] E. Candès, J. Romberg, T. Tao, "Robust uncertainty principles: exact signal recons-truction from highly incomplete frequency information", IEEE Trans. Inform. Theory, vol. 52, No. 2, pp. 489-509, 2006. [DOI:10.1109/TIT.2005.862083]

10. [10] E. J. Candes, T. Tao, "Near-optimal signal recovery from random projections and universal encoding strategies", IEEE Trans. Inform. Theory, vol. 52, No. 12, pp. 5406-5425, 2006. [DOI:10.1109/TIT.2006.885507]

11. [11] H. Xu, "Speech enhancement based on compressed sensing technology", Sensors & Transducers Journal, vol. 181, pp. 141-145, 2014.

12. [12] G. S. Sivaram, S. K. Nemala, M. Elhilali, H. Hermansky, "Sparse coding for speech recognition", IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4346-4349, 2010. [DOI:10.1109/ICASSP.2010.5495649]

13. [13] D. D. Lee, H. S. Seung, "Learning the parts of objects by non-negative matrix factorization", Nature 401, 788- 791, 1999. [DOI:10.1038/44565] [PMID]

14. [14] N. Mohammadiha, P. Smaragdis, A. Leijon, "Supervised and unsupervised speech enhan-cement using nonnegative matrix factor-ization", IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, No. 10, pp.2140-2151, 2013. [DOI:10.1109/TASL.2013.2270369]

15. [15] J. T. Geiger, J. F. Gemmeke, B. Schuller, and G. Rigoll, "Investigating NMF speech enhancement for neural network based acoustic models", Proceedings of the Annual Con-ference of International Speech Communi-cation Association (INTERSPEECH), 2014.

16. [16] P. O. Hoyer, "Non-negative matrix factorization with sparseness constraints", J. Mach. Learn. Res., vol. 5, pp. 1457-1469, 2004.

17. [17] C. D. Sigg, T. Dikk, J. M. Buhmann, "Speech enhancement with sparse coding in learned dictionaries", In Acoustics Speech and Signal Processing (ICASSP), IEEE International Conference on, pp. 4758-4761, 2010. [DOI:10.1109/ICASSP.2010.5495157]

18. [18] C. D. Sigg, T. Dikk, J. M. Buhmann, "Speech enhancement using generative dictionary learning", IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, No. 6, pp. 1698-1712, 2012. [DOI:10.1109/TASL.2012.2187194]

19. [19] M. Aharon, M. Elad, A. Bruckstein, "K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation", IEEE Trans. Signal Process, vol. 54, No. 11, pp. 4311-4322, 2006. [DOI:10.1109/TSP.2006.881199]

20. [20] H. Yongjun, J. Han, S. Deng, T. Zheng, G. Zheng, "A solution to residual noise in speech denoising with sparse representation", In Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference on, pp. 4653-4656, 2012.

21. [21] S. Mavaddaty, S. M. Ahadi, S. Seyedin, "A novel speech enhancement method by learnable sparse and low-rank decomposition and domain adaptation", Speech Communi-cation, vol. 76, pp. 42-60, 2016. [DOI:10.1016/j.specom.2015.11.003]

22. [22] M. G. Jafari, M. D. Plumbley, "Speech denoising based on a greedy adaptive dic-tionary algorithm", 17th European Signal Processing Conference (EUSIPCO), pp. 1423-1426, 2009.

23. [23] M. G. Jafari, M. D. Plumbley, "Fast dictionary learning for sparse representations of speech signals", IEEE Journal of selected topics in signal processing, vol. 5, pp. 1025-1031, 2011. [DOI:10.1109/JSTSP.2011.2157892]

24. [24] R. Rubinstein, M. Zibulevsky, M. Elad, "Double sparsity: Learning sparse dictionaries for sparse signal approximation", IEEE Trans. on Signal Processing, Vol. 58, pp. 1553-1564, 2010. [DOI:10.1109/TSP.2009.2036477]

25. [25] M. G. Jafari, E. Vincent, S. A. Abdallah, M. D. Plumbley, M. E. Davies, "An adaptive stereo basis method for convolutive blind audio source separation", Neuro computing, vol. 71, pp. 2087-2097, 2008. [DOI:10.1016/j.neucom.2007.08.029]

26. [26] http://www.dcs.shef.ac.uk/spandh/gridcorpus.

27. [27] N. Hurley, S. Rickard, "Comparing measures of sparsity", IEEE Trans. on Information Theory, vol. 55, pp. 4723-4741, 2009. [DOI:10.1109/TIT.2009.2027527]

28. [28] S. Rickard, M. Fallon, "The gini index of speech", In Proc. Of Information Sciences and Systems conference, Princeton, NJ, 2004.

29. [29] P. J. Wolfe, "Sparse time-frequency repre-sentations in audio processing, as studied through a symmetrized lognormal model," In Proc. of the European Signal Processing Con-ference (EUSIPCO), pp. 355-359, 2007.

30. [30] H. Dalton, "The measurement of the inequity of incomes", Economic Journal, Vol. 30, pp. 348-361, 1920. [DOI:10.2307/2223525]

31. [31] I. Cohen, B. Berdugo, "Speech enhancement for non-stationary noise environments", Signal processing, vol. 81, No. 11, pp. 2403-2418, 2001. [DOI:10.1016/S0165-1684(01)00128-1]

32. [32] A. Rix, J. Beerends, M. Hollier, A. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs", In Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., pp. 749-752, 2001.

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.