بهبود دقت واژگان کلیدی استخراج‌شده از متن فارسی با استفاده از الگوریتم Word2Vec

حسنی آهنگر, محمدرضا; امیری جزه, علی

doi:10.52547/jsdp.18.1.60

دوره 18، شماره 1 - ( 3-1400 ) جلد 18 شماره 1 صفحات 51-60 | برگشت به فهرست نسخه ها

‎ 10.52547/jsdp.18.1.60

Mendeley

Zotero

RefWorks

Hasni Ahangar M R, Amiri jezeh A. Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm. JSDP 2021; 18 (1) :60-51
URL: http://jsdp.rcisp.ac.ir/article-1-858-fa.html

حسنی آهنگر محمدرضا، امیری جزه علی. بهبود دقت واژگان کلیدی استخراج‌شده از متن فارسی با استفاده از الگوریتم Word2Vec. پردازش علائم و داده‌ها. 1400; 18 (1) :60-51

URL: http://jsdp.rcisp.ac.ir/article-1-858-fa.html

بهبود دقت واژگان کلیدی استخراج‌شده از متن فارسی با استفاده از الگوریتم Word2Vec

محمدرضا حسنی آهنگر^*

، علی امیری جزه

دانشگاه جامع امام حسین (ع)

چکیده: (3851 مشاهده)

واژگان کلیدی لغات مهمی از سند هستند که بیان‌گر توصیفی از متن هستند و نقش بسیار مهمی در فهم دقیق و سریع از محتوا دارند. شناسایی واژگان کلیدی از متن با روش‌های معمول کاری زمان‌بر و پرهزینه است. در این مقاله ابتدا با استفاده از شبکه عصبی پیشرو و از طریق الگوریتم Word2Vec ماتریس همبستگی واژگان را به‌ازای یک سند محاسبه و سپس با استفاده از ماتریس همبستگی و یک فهرست اولیه محدود از واژگان کلیدی، نزدیک‌ترین واژگان را از نظر شباهت در قالب فهرست نزدیک‌ترین همسایگی‌ها استخراج می‌کنیم. فهرست به‌دست‌آمده را به‌صورت نزولی مرتب و از ابتدای فهرست، درصدهای مختلفی از واژگان را انتخاب و به‌ازای هر درصد، ده مرتبه فرایند آموزش شبکه عصبی و ساخت ماتریس همبستگی و استخراج فهرست نزدیک‌ترین همسایگی‌ها را تکرار و در‌نهایت میانگین دقت، فراخوانی و معیارF را محاسبه می‌کنیم. این کار را تا جایی ادامه می‌دهیم که به بهترین نتایج در ارزیابی دست یابیم؛ نتایج نشان می‌دهند که به‌ازای انتخاب حداکثر چهل درصدِ واژگان از ابتدای فهرستِ نزدیک‌ترین همسایگی‌ها، نتایج مورد قبولی به‌دست می‌آید. الگوریتم بر روی پیکره‌ای با هشتصد خبر که به‌صورت دستی واژگان کلیدی آن‌ها را استخراج کرده‌ایم، آزمایش‌شده است و نتایج آزمایش‌ها نشان می‌دهد که دقت روش پیشنهادی 78 درصد خواهد بود.

واژه‌های کلیدی: واژگان کلیدی، الگوریتم word2Vec، شبکه عصبی، وزن دهی ویژگی

متن کامل [PDF 1265 kb] (1254 دریافت)

نوع مطالعه: كاربردي | موضوع مقاله: مقالات پردازش متن
دریافت: 1397/2/2 | پذیرش: 1399/12/9 | انتشار: 1400/3/1 | انتشار الکترونیک: 1400/3/1

فهرست منابع

1. [1] F. Liu, X. Huang, W. Huang, and S. X. Duan, "Performance Evaluation of Keyword Extraction Methods and Visualization for Student Online Comments," Symmetry, vol. 12, no. 11, p. 1923, 2020. [DOI:10.3390/sym12111923]

2. [2] H. Yan, Q. He, and W. Xie, "Crnn-Ctc Based Mandarin Keywords Spotting," in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: IEEE, pp. 7489-7493. [DOI:10.1109/ICASSP40776.2020.9054618]

3. [3] Y. Zhang, M. Tuo, Q. Yin, L. Qi, X. Wang, and T. Liu, "Keywords extraction with deep neural network model," Neurocomputing, vol. 383, pp. 113-121, 2020. [DOI:10.1016/j.neucom.2019.11.083]

4. [4] M. Mohammadi and M. Analouyi, "Keyword extraction in Persian documents," presented at the 13th Conference of Iran Computer Association, Kish, Iran, 2007.

5. [5] H. Veisi, N. Aflaki, and P. Parsafard, "Variance-based features for keyword extraction in Persian and English text documents," Scientia Iranica, vol. 27, no. 3, pp. 1301-1315, 2020.

6. [6] C. Zhang, "Automatic keyword extraction from documents using conditional random fields," Journal of Computational Information Systems, vol. 4, no. 3, pp. 1169-1180, 2008.

7. [7] X. Wan and J. Xiao, "Single Document Keyphrase Extraction Using Neighborhood Knowledge," in AAAI, 2008, vol. 8, pp. 855-860.

8. [8] D. B. Bracewell, F. Ren, and S. Kuriowa, "Multilingual single document keyword extraction for information retrieval," in 2005 International Conference on Natural Language Processing and Knowledge Engineering, 2005: IEEE, pp. 517-522.

9. [9] P. D. Turney, "Learning to extract keyphrases from text," arXiv preprint cs/021, 2002, 2013.

10. [10] Y. Matsuo and M. Ishizuka, "Keyword extraction from a single document using word co-occurrence statistical information," International Journal on Artificial Intelligence Tools, vol. 13, no. 01, pp. 157-169, 2004. [DOI:10.1142/S0218213004001466]

11. [11] S. Rose, D. Engel, N. Cramer, and W. Cowley, "Automatic keyword extraction from individual documents," Text mining: applications and theory, vol. 1, pp. 1-20, 2010. [DOI:10.1002/9780470689646.ch1]

12. [12] J. Wang, H. Peng, and J.-s. Hu, "Automatic keyphrases extraction from document using neural network," in Advances in Machine Learning and Cybernetics: Springer, 2006, pp. 633-641. [DOI:10.1007/11739685_66]

13. [13] A. Ahmadi and T. Hosseinkhah, "Extract keywords from a text using neural networks," presented at the 10th International Conference on Industrial Engineering, Tehran, Iran Industrial Engineering Association, Amirkabir University of Technology, 2013.

14. [14] S. De Deyne, S. Verheyen, and G. Storms, "Structure and organization of the mental lexicon: A network approach derived from syntactic dependency relations and word associations," in Towards a theoretical framework for analyzing complex linguistic networks: Springer, 2016, pp. 47-79. [DOI:10.1007/978-3-662-47238-5_3]

15. [15] E. L. Lin and G. L. Murphy, "Thematic relations in adults' concepts," Journal of experimental psychology: General, vol. 130, no. 1, p. 3, 2001. [DOI:10.1037/0096-3445.130.1.3] [PMID]

16. [16] F. Liu, D. Pennell, F. Liu, and Y. Liu, "Unsupervised approaches for automatic keyword extraction using meeting transcripts," in Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics, 2009, pp. 620-628. [DOI:10.3115/1620754.1620845]

17. [17] X. Ao, X. Yu, D. Liu, and H. Tian, "News keywords extraction algorithm based on TextRank and classified TF-IDF," in 2020 International Wireless Communications and Mobile Computing (IWCMC), 2020: IEEE, pp. 1364-1369. [DOI:10.1109/IWCMC48107.2020.9148491]

18. [18] F. Liu, X. Huang, and W. Huang, "Comparing Machine Learning Algorithms to Predict Topic Keywords of Student Comments," in International Conference on Cooperative Design, Visualization and Engineering, 2020: Springer, pp. 178-183. [DOI:10.1007/978-3-030-60816-3_20]

19. [19] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, and A. Jatowt, "YAKE! Keyword extraction from single documents using multiple local features," Information Sciences, vol. 509, pp. 257-289, 2020. [DOI:10.1016/j.ins.2019.09.013]

20. [20] J. R. Thomas, S. K. Bharti, and K. S. Babu, "Automatic keyword extraction for text summarization in e-newspapers," in Procee-dings of the international conference on informatics and analytics, 2016, pp. 1-8. [DOI:10.1145/2980258.2980442]

21. [21] D. M. Allen, "The relationship between variable selection and data agumentation and a method for prediction," technometrics, vol. 16, no. 1, pp. 125-127, 1974. [DOI:10.1080/00401706.1974.10489157]

22. [22] M. Stone, "Cross‐validatory choice and assessment of statistical predictions," Journal of the Royal Statistical Society: Series B (Methodological), vol. 36, no. 2, pp. 111-133,1974. [DOI:10.1111/j.2517-6161.1974.tb00994.x]

23. [23] M. Stone, "An asymptotic equivalence of choice of model by cross‐validation and Akaike's criterion," Journal of the Royal Statistical Society: Series B (Methodological), vol. 39, no. 1, pp. 44-47, 1977. [DOI:10.1111/j.2517-6161.1977.tb01603.x]

24. [24] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word represen-tations in vector space," arXiv preprint arXiv:1301.3781, 2013.

25. [25] C. Manning and R. Socher, "Natural language processing with deep learning," Lecture Notes Stanford University School of Engineering, 2017.

26. [26] J. Hu, S. Li, Y. Yao, L. Yu, G. Yang, and J. Hu, "Patent keyword extraction algorithm based on distributed representation for patent classification," Entropy, vol. 20, no. 2, pp. 104, 2018. [DOI:10.3390/e20020104] [PMID] [PMCID]

27. [27] H. Omid and S. Saeedeh, Sadidpour, "Automatic extraction of Persian short text keywords using word2vec," Electronic and cyber defense, vol. 8, 2, pp. 105-114, 2020.

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.

نظر شما در مورد قالب جدید چیست؟
	خوب
	متوسط
	ضعیف

پایگاه‌های مرتبط

واژگان کلیدی

نظرسنجی