مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

جعفری, هدی سادات; همایون پور, محمدمهدی

doi:10.29252/jsdp.14.4.143

دوره 14، شماره 4 - ( 12-1396 ) جلد 14 شماره 4 صفحات 157-143 | برگشت به فهرست نسخه ها

‎ 10.29252/jsdp.14.4.143

Mendeley

Zotero

RefWorks

Jafari H S, homayounpour M. A comparison of machine learning techniques for Persian Extractive Speech to Speech Summarization without Transcript. JSDP 2018; 14 (4) :143-157
URL: http://jsdp.rcisp.ac.ir/article-1-491-fa.html

جعفری هدی سادات، همایون پور محمدمهدی. مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت. پردازش علائم و داده‌ها. 1396; 14 (4) :143-157

URL: http://jsdp.rcisp.ac.ir/article-1-491-fa.html

مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

هدی سادات جعفری^*

، محمدمهدی همایون پور

دانشگاه صنعتی امیرکبیر

چکیده: (6703 مشاهده)

در این مقاله، خلاصه‌سازی استخراجی گفتار با استفاده از روش‌های مختلف یادگیری ماشین مورد مطالعه قرار گرفته است. خلاصه‌سازی یک فایل گفتاری به معنای استخراج بخش‌های مهم و شاخص گفتار به‌منظور دسترسی، جستجو، استخراج و مرورگری آسان‌تر و کم‌هزینه‌تر اطلاعات فایل‌های گفتاری است. در این مقاله، یک روش جدید خلاصه‌سازی گفتار بدون استفاده از سامانه بازشناسی خودکار گفتار ارائه شده است. الگوهای تکراری بین دو جمله گفتاری با استفاده از الگوریتم S-DTW، به‌طورمستقیم از روی سیگنال گفتار شناسایی می‌شوند. بعد از تعیین شباهت بین دو جمله و استخراج تعدادی ویژگی از هر جمله تأثیر روش‌های مختلف یادگیری ماشین، بانظارت، بی‌نظارت و نیمه‌نظارتی مورد بررسی قرار گرفته است. آزمایش‌ها برروی یک پیکره خوانده‌شده اخبار فارسی انجام شده است. نتایج نشان می‌دهد با استفاده از ویژگی‌های مناسب، بدون استفاده از رونوشت به کارایی بالاتری نسبت به روش‌های پایه (3٪ افزایش در مقایسه با انتخاب نخستین جملات و 5٪ افزایش در مفایسه با انتخاب طولانی‌ترین جملات با استفاده از معیار ROUGE-3) می‌توان دست پیدا کرد.

واژه‌های کلیدی: خلاصه‌سازی استخراجی گفتار، سیگنال گفتار، الگوها کلیدی، الگوریتم S-DTW، یادگیری ماشین

متن کامل [PDF 6205 kb] (1912 دریافت)

نوع مطالعه: كاربردي | موضوع مقاله: مقالات پردازش گفتار
دریافت: 1394/11/29 | پذیرش: 1396/3/20 | انتشار: 1396/12/22 | انتشار الکترونیک: 1396/12/22

فهرست منابع

1. [48] آ. پور معصومی، م. کاهانی، س. ا. طوسی، ا. استیری، ه. قائمی، (1393). ایجاز: یک سامانه عملیاتی برای خلاصه‌سازی تک سندی متون خبری فارسی. پردازش علائم و داده‌ها 1 (21): 33-48.

2. [49] ه. س جعفری و م.م همایون‌پور، (1392). تشخیص الگوهای کلیدی از سیگنال گفتار فارسی بدون استفاده از رونوشت. نوزدهمین کنفرانس ملی سالانه انجمن کامپیوتر ایران. دانشگاه شهید بهشتی، تهران، ایران.

3. [1] A. McCallum, "An Ecologically Valid Evaluation of Speech Summarizationin the University Lecture Domain," MSc thesis, University of Toronto, 2012.

4. [2] S. R. Maskey, "Automatic Broadcast News Speech Summarization," PhD thesis, School of Arts and Sciences, Columbia University, 2008.

5. [3] R. Flamary, X. Anguera, and N. Oliver, "Spoken WordCloud: Clustering recurrent patterns in speech," in CBMI 2011, pp. 133-138.

6. [4] Y. Liu, S. Xie, and F. Liu, "Using n-best recognition output for extractive summarization and keyword extraction in meeting speech," in ICASSP 2010, pp. 5310-5313.

7. [5] S. Xie and Y. Liu, "Using N-Best Lists and Confusion Networks for Meeting Summariza-tion," IEEE Transactions on Audio, Speech & Language Processing, vol. 19, no. 5, pp. 1160-1169, 2011. [DOI:10.1109/TASL.2010.2082534]

8. [6] J. Zhang, R. H. Y. Chan, P. Fung, and L. Cao, "A comparative study on speech summarization of broadcast news and lecture speech," in INTERSPEECH 2007, pp. 2781-2784.

9. [7] S. Xie, D. Hakkani-Tür, B. Favre, and Y. Liu, "Integrating prosodic features in extractive meeting summarization," in ASRU, Mer-ano/Meran, Italy, 2009, pp. 387-391.

10. [8] S. Xie, Y. Liu, and H. Lin, "Evaluating the effectiveness of features and sampling in extractive meeting summarization," presented at the SLT 2008.

11. [9] S. Xie and Y. Liu, "Improving supervised learning for meeting summarization using sampling and regression," Computer Speech & Language, vol. 24, no. 3, pp. 495-514, 2010. [DOI:10.1016/j.csl.2009.04.007]

12. [10] S.-H. Lin and B. Chen, "A Risk Minimization Framework for Extractive Speech Summariza-tion," in ACL Uppsala, Sweden, 2010, pp. 79-87.

13. [11] B. Chen and S.-H. Lin, "A Risk-Aware Modeling Framework for Speech Summariza-tion," IEEE Transactions on Audio, Speech & Language Processing, vol. 20, no. 1, pp. 211-222, 2012. [DOI:10.1109/TASL.2011.2159596]

14. [12] J. J. Zhang and P. Fung, "Learning deep rhetorical structure for extractive speech summarization," in ICASSP, 2010, pp. 5302-5305.

15. [13] J. Zhang, H. Yuan, and X. Pan, "rhetorical-state SVM for Lecture speech summarization," In-formation Technology Journal, 2014.

16. [14] S.-H. Lin, Y.-M. Yeh, and B. Chen, "Leveraging Kullback–Leibler Divergence Measures and Information-Rich Cues for Speech Summariza-tion," IEEE Transactions on Audio, Speech, and Language, vol. 19 no. 4, pp. 871-882, May 2011. [DOI:10.1109/TASL.2010.2066268]

17. [15] S. Xie, H. Lin, and Y. Liu, "Semi-supervised extractive speech summarization via co-training algorithm," in INTERSPEECH 2010, pp. 2522-2525.

18. [16] B. Chen, H.-C. Chang, and K.-Y. Chen, "Sentence modeling for extractive speech summarization," in ICME, San Jose, CA, USA, 2013, pp. 1-6.

19. [17] B. Chen, S.-H. Lin, Y.-M. Chang, and J.-W. Liu, "Extractive speech summarization using evaluation metric-related training criteria," Information Processing and Management, vol. 49, no. 1, pp. 1-12, 2013. [DOI:10.1016/j.ipm.2011.12.002]

20. [18] D. Gillick, K. Riedhammer, B. Favre, and D. Z. Hakkani-Tür, "A global optimization framework for meeting summarization," in ICASSP 2009, pp. 4769-4772.

21. [19] K. Riedhammer, B. Favre, and D. Hakkani-Tür, "Long story short - Global unsupervised models for keyphrase based meeting summarization," Speech Communication, vol. 52, pp. 801-815, 2010. [DOI:10.1016/j.specom.2010.06.002]

22. [20] Y.-N. Chen, Y. Huang, C.-f. Yeh, and L.-S. Lee, "Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automati-cally Extracted Key Terms," in INTERSPEECH 2011, pp. 933-936.

23. [21] T. J. Hazen, "Latent Topic Modeling for Audio Corpus Summarization," in INTERSPEECH 2011, pp. 913-916.

24. [22] L. Wang and C. Cardie, "Unsupervised Topic Modeling Approaches to Decision Summariza-tion in Spoken Meetings," in SIGDIAL Confer-ence, 2012, pp. 40-49.

25. [23] K.-Y. Chen et al., "Extractive Broadcast News Summarization Leveraging Recurrent Neural Network Language Modeling Techniques," IEEE/ACM Transactions on Audio, Speech & Language Process-ing, vol. 23, no. 8, pp. 1322-1334, 2015.

26. [24] M. H. Bokaei, H. Sameti, and Y. Liu, "Extractive summarization of multi-party meet-ings through discourse segmentation," Natural Language Engineering, vol. 22, no. 1, pp. 41-72, 2016. [DOI:10.1017/S1351324914000199]

27. [25] M.-H. Siu, H. Gish, A. Chan, W. Belfield, and S. Lowe, "Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery," Computer Speech & Language, vol. 28, no. 1, pp. 210-223, 2014. [DOI:10.1016/j.csl.2013.05.002]

28. [26] N. F. Chen, B. Ma, and H. Li, "Minimal-resource phonetic language models to summar-ize untranscribed speech," in ICASSP 2013, pp. 8357-8361.

29. [27] A. Muscariello, G. Gravier, and F. Bimbot, "Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination," IEEE Transactions on Audio, Speech & Language Processing, vol. 20, no. 7, pp. 2031-2044, 2012. [DOI:10.1109/TASL.2012.2194283]

30. [28] J. R. Glass, "Towards unsupervised speech processing," in ISSPA Montreal, QC, Canada, 2012, pp. 1-4.

31. [29] D. F. Harwath, T. J. Hazen, and J. R. Glass, "Zero resource spoken audio corpus analysis," in ICASSP 2013, pp. 8555-8559.

32. [30] S. Maskey and J. Hirschberg, "Summarizing Speech Without Text Using Hidden Markov Models," presented at the HLT-NAACL, 2006. [DOI:10.3115/1614049.1614072]

33. [31] S. H. Yella, V. Varma, and K. Prahallad, "Prominence based scoring of speech segments for automatic speech-to-speech summarization," in INTERSPEECH 2010, pp. 1297-1300.

34. [32] S. K. Jauhar, Y.-N. Chen, and F. Metze, "Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk," in IJCNLP, Nagoya, Japan, 2013, pp. 648-654.

35. [33] J. Zhang and H. Yuan, "Speech Summarization without Lexical Features for Mandarin Presenta-tion Speech," in IALP, Urumqi, China, 2013, pp. 147-150.

36. [34] X. Zhu, G. Penn, and F. Rudzicz, "Summarizing multiple spoken documents: finding evidence -from untranscribed audio," in ACL/IJCNLP, 2009, pp. 549-557.

37. [35] A. S. Park and J. R. Glass, "Unsupervised Pattern Discovery in Speech," IEEE Transac-tions on Audio, Speech & Language Processing, vol. 16, no. 1, pp. 186-197, 2008. [DOI:10.1109/TASL.2007.909282]

38. [36] A. Jansen, K. Church, and H. Hermansky, "Towards spoken term discovery at scale with zero resources," in INTERSPEECH 2010, pp. 1676-1679.

39. [37] Y. Zhang, "Unsupervised Speech Processing with Applications to Query-by-Example Spoken Term Detection," PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 2013.

40. [38] Y. Zhang and J. R. Glass, "Towards multi-speaker unsupervised speech pattern discovery," in ICASSP 2010, pp. 4366-4369.

41. [39] D. F. Harwath, "Unsupervised Modeling of Latent Topics and Lexical Units in Speech Audio," MSc thesis, Department of Electrical Engineering and Computer Science, Massachu-setts Institute of Technology, 2013.

42. [40] R. Mihalcea and P. Tarau, "TextRank: Bringing Order into Text," in EMNLP Barcelona, Spain, 2004, pp. 404-411.

43. [41] S. Maskey and J. Hirschberg, "Comparing Lexial, Acoustic/Prosodic, Discourse and Structural Features for Speech Summarization," in Eurospeech Lisbon, Portugal, 2005.

44. [42] H. S. Jafari and M. M. Homayounpour, "key pattern recognition from Persian speech signal without transcript," presented at the 19th National CSI Computer Conference, Shahid Beheshti University, Tehran, Iran, 2014.

45. [43] A. Blum and T. Mitchell, "Combining labeled and unlabeled data with co-training," in 11th Annual Conference on Computational Learning Theory, 1998, pp. 92-100. [DOI:10.1145/279943.279962]

46. [44] S. A. Goldman and Y. Zhou, "Enhancing Supervised Learning with Unlabeled Data," in ICML, Stanford, CA, USA, 2000, pp. 327-334.

47. [45] B. B. Moghaddas, M. Kahani, S. A. Toosi, AsefPourmasoumi, and A. Estiri, "Pasokh: A standard corpus for the evaluation of Persian text summarizers," in ICCKE, Mashhad, Iran, 2013, pp. 471-475.

48. [46] DUC. (2013). http://duc.nist.gov/.

49. [47] C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries. In Proceedings," in workshop on text summarization branches out, 2004, pp. 25-26.

50. [48] A. pourmasoomi, M. kahani, S. A. Toosi, and A. Estiri, "Ijaz: An Operational system for single-document summarization of Persian news texts," JSDP, vol. 11, no. 1, pp. 33-48, 2014.

51. [49] H. S. Jafari and M. M. Homayounpour, "Persian speech sentence segmentation without speech recognition," presented at the Iranian Conference on Intelligent Systems (ICIS), Bam, Kerman, 2014. [DOI:10.1109/IranianCIS.2014.6802564]

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.

نظر شما در مورد قالب جدید چیست؟
	خوب
	متوسط
	ضعیف

پایگاه‌های مرتبط

واژگان کلیدی

نظرسنجی