مرجع‌گزینی در زبان فارسی با استفاده از شبکه عصبی عمیق

سهلانی, حسین; حورعلی, مریم; مینایی بیدگلی, بهروز

doi:10.29252/jsdp.17.2.138

دوره 17، شماره 2 - ( 6-1399 ) جلد 17 شماره 2 صفحات 121-138 | برگشت به فهرست نسخه ها

‎ 10.29252/jsdp.17.2.138

Mendeley

Zotero

RefWorks

sahlani H, Hourali M, Minaei-Bidgoli B. Corefrence resolution with deep learning in the Persian Labnguage. JSDP 2020; 17 (2) :138-121
URL: http://jsdp.rcisp.ac.ir/article-1-888-fa.html

سهلانی حسین، حورعلی مریم، مینایی بیدگلی بهروز. مرجع‌گزینی در زبان فارسی با استفاده از شبکه عصبی عمیق. پردازش علائم و داده‌ها. 1399; 17 (2) :138-121

URL: http://jsdp.rcisp.ac.ir/article-1-888-fa.html

مرجع‌گزینی در زبان فارسی با استفاده از شبکه عصبی عمیق

حسین سهلانی^*

، مریم حورعلی

، بهروز مینایی بیدگلی

دانشگاه صنعتی مالک اشتر

چکیده: (3000 مشاهده)

در حال حاضر با توجه به کثرت شبکه‌های اجتماعی و شبکه‌های خبری تلویزیونی، رادیویی، اینترنتی و غیره، خواندن تمام متون مختلف و به‌تبع آن تحلیل آن‌ها و دست‌یابی به ارتباطات این متون نیازمند صرف هزینه زمانی و انسانی بسیار بالا است که در عصر کنونی با استفاده از فن‌های مختلف پردازش زبان طبیعی صورت می‌گیرد، یکی از چالش‌های موجود در این زمینه پایین‌بودن دقت سامانه‌های مرجع‌گزینی است که سبب کشف روابط ناصحیح و یا عدم کشف روابط صحیح می‌شود. مراحل کلی حل مسأله مرجع‌گزینی از سه‌گامِ شناسایی موجودیت‌های نامدار، استخراج ویژگی‌های موجودیت‌های نامدار و مرجع‌گزینی آن‌ها تشکیل ‌شده است. موجودیتهای نامدار ویژگی‌های فراوانی دارند، وجود ویژگی‌های مختلف (متناسب و متناقض با مرجع) در گراف‌ها این امکان را می‌دهند که بتوان حد آستانه‌ای را از ترکیب ویژگی‌های مختلف استخراج کرد. در مقاله ارائه‌شده ابتدا پیش‌پردازش‌های مختلف روی پیکره پژوهشگاه خواجه‌نصیر [1] انجام گرفت؛ سپس با استفاده از الگوریتم‌های مبتنی بر شبکه عصبی عمیق داده‌های موجود به بردارهای عددی تبدیل شدند و پس از آن با استفاده از گراف و با ویژگی‌هایی که در متن مقاله عنوان‌شده هرس اولیه انجام گرفت؛ درواقع رویکردهای مبتنی بر گراف، موجودیت‌ها را همچون مجموعه‌ای از عناصر مرتبط با یکدیگر می‌شناسد که تحلیل روابط میان موجودیت‌های اولیه در گراف و وزن‌دهی به این ارتباط‌ها، منجر به استخراج ویژگی‌های سطح بالاتر و مرتبط‌تری می‌‌شود و نیز تناقضات ایجادشده بر اساس کمبود اطلاعات را تا حدودی کاهش می‌دهد. سپس با استفاده از شبکه‌های عصبی، روی پیکره مورداشاره در [30] (پیکره آزمون اپسلا) مرجع‌گزینی انجام گرفت که نتایج حاصل بیان‌گر بهبود روش پیشنهادی (رسیدن به دقت 09/62) است که در متن مقاله به‌طور مشروح بیان‌شده است.

واژه‌های کلیدی: مرجع‌گزینی، گراف، شناسایی موجودیت نامدار، استخراج اطلاعات از متن، شبکه‌های عصبی عمیق

متن کامل [PDF 5417 kb] (990 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: مقالات پردازش متن
دریافت: 1397/5/30 | پذیرش: 1398/6/11 | انتشار: 1399/6/24 | انتشار الکترونیک: 1399/6/24

فهرست منابع

1. [1] رحیمی زینب، حسین نژاد شادی. هم‌مرجع‌یابی مبتنی بر پیکره در متون فارسی. پردازش علائم و داده‌ها. ۱۳۹۹; ۱۷ (۱) :۷۹-۹۸

2. [1] Z. Rahimi, S. HosseinNejad "Corpus based coreference resolution for Farsi text", JSDP, vol. 17 (1), pp. 79-98, 2020.

3. [2] حسین نژاد، شادی؛ شکفته، یاسر و امامی آزادی، طاهره. «پیکره اعلام، یک پیکره استاندارد موجودیت‌های نامدار فارسی»؛ پردازش علائم و داده‌ها، دوره 14، شماره 3; صص. 127-142، 1396.

4. [2] Y. Shekofteh, T. Emami Azadi, "A'laam Corpus: A Standard Corpus of Named Entity for Persian Language", JSDP, Vol. 14 (3), pp.127-142, 2017. [DOI:10.29252/jsdp.14.3.127]

5. [3] سادات‌مرتضوي، پونه؛ شمس‌فرد، مهرنوش. «شناسایی موجودیت‌های نامدار در متون فارسی.» پانزدهمین کنفرانس بین‌المللی سالانه انجمن کامپیوتر، تهران، 1388.

6. [3] P. S. Mortazavi and M.Shamsfard "Recognition of named entities in Persian texts," in 15-th annual conference of computer society of Iran, Tehran, 2009.

7. [4] A. AleAhmad, H. Amiri, E. Darrudi, M. Rahgozar, and F. Oroumchian, "Hamshahri: A Standard Persian text collection", Knowledge-Based Systems, Vol. 22(5), pp.382-387, 2009. [DOI:10.1016/j.knosys.2009.05.002]

8. [5] A. Rahman, Ng. Vincent, "Coreference resolution with world knowledge," 49th Annual Meeting of the Association for Computational Linguistics, Vol. 1, 2013.

9. [6] B. Amit, B. Breck, "Algorithms for scoring coreference chains", In Proceedings of the LREC Workshop on Linguistic Coreference, pp. 563-566, 1998.

10. [7] M. Bijankhan, J.Sheykhzadegan, M. Bahrani, and M.Ghayoomi, "Lessons from Building a Persian Written Corpus: Peykare", Language Resources and Evaluation, Vol. 45(2), pp.143-164, 2011. [DOI:10.1007/s10579-010-9132-x]

11. [8] Ch.Prafulla, Ch. Kumar and R. Huang, "Event coreference resolution by iteratively unfolding inter-dependencies among events", In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2124-2133, 2017.

12. [9] C. Kevin and Ch. D. Manning, "Deep Reinforcement Learning for Mention-Ranking Coreference Models," In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2256-2262, 2016.

13. [10] C. Kevin and Ch. D. Manning, "Improving Coreference Resolution by Learning Entity-Level Distributed Representations," In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics Vol.1, pp. 643-653. 2016.

14. [11] C. Nicolae, G. Nicolae, "Bestcut: A graph algorithm for coreference resolution," conference on empirical methods in natural language processing, 2014.

15. [12] D. Pascal and B. Jason, "Specialized models and ranking for coreference resolution," In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 660-669, 2008.

16. [13] D. Chase, L. Chan, H. Peng, H. Wu, Sh. Upadhyay, N. Gupta, C. Tsai, M. Sammons, and D. Roth, "UI CCG TAC-KBP2017 submissions: Entity discovery and linking, and event nugget detection and co-reference," In Proceedings of the Text Analysis Conference, 2017.

17. [14] A. Haghighi, and D. Klein, "Simple coreference resolution with rich syntactic and semantic features," In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Vol.3, pp. 1152-1161, Association for Computational Linguistics, 2009. [DOI:10.3115/1699648.1699661]

18. [15] Lee. Heeyoung, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky, "Deterministic coreference resolution based on entity-centric, precision-ranked rules". Computational Linguistics, 2013. [DOI:10.1162/COLI_a_00152]

19. [16] J.Heng and R. Grishman, "Knowledge base population: Successful approaches and challenges", In Proceedings of the 49th Annual Meeting of the Association for ComProceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) putational Linguistics: Human Language Technologies, pp. 1148-1158, 2011.

20. [17] J. Shanshan, Y. Li, T. Qin, Q. Meng, and B. Dong, "SRCB entity discovery and linking (EDL) and event nugget systems for TAC 2017", In Proceedings of the Text Analysis Conference, 2017.

21. [18] P. Haoruo, Y. Song, and D. Roth, "Event detection and co-reference with minimal supervision", In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 392-402, 2016.

22. [19] P. S. Paolo, S. Michael, "Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution," main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2014.

23. [20] H. Poon, P. Domingos, "Joint unsupervised coreference resolution with markov logic", In: Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 650-659, 2008. [DOI:10.3115/1613715.1613796]

24. [21] L. Heeyoung, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky, "Deterministic coreference resolution based on entity-centric, precision-ranked rules" Computational Linguistics, Vol. 39(4), pp.885- 916, 2013. [DOI:10.1162/COLI_a_00152]

25. [22] L. Zhengzhong, J. Araki, E. Hovy, and T. Mitamura, "Supervised within-document event coreference using information pro-pagation", In Proceedings of the Ninth Lan-guage Resources and Evaluation Conference, pp. 4539- 4544, 2014.

26. [23] L. Jing and V. Ng, "Joint learning for event coreference resolution", In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol.1, pp. 90-101, 2017.

27. [24] L. Jing and V. Ng, "Learning antecedent structures for event coreference resolution", In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications, pp. 113-118, 2017. [DOI:10.1002/cc.20233]

28. [25] L. Jing and V. Ng, "UTD's event nugget detection and coreference system at KBP 2017", In Proceedings of the Text Analysis Conference, 2017.

29. [26] L. Xiaoqiang, "On coreference resolution performance metrics" In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 25-32, 2005.

30. [27] A. McCallum, B.Wellner, "Conditional models of identity uncertainty with application to noun coreference", In: Advances in neural info-rmation processing systems, pp. 905-912, 2005.

31. [28] V. Ng, "Supervised noun phrase coreference research", The first fifteen years. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, 2010, pp. 1396-1411.

32. [29] M. Rasooli, M. Kouhestani, and A. Moloodi, "Development of a Persian Syntactic Dependency Treebank", In The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Atlanta, USA, 2013, pp. 306-314.

33. [30] M. Seraji, B.Megyesi, J.Nivre, "Bootstrapping a Persian Dependency Treebank", Published as a Journal in Special Issue of the Linguistic Issues in Language Technology (LiLT), Heidelberg, Germany, 2012.

34. [31] S. W. Meng, H. T. Ng, D. Chung, Y. Lim, "A machine learning approach to coreference resolution of noun phrases", Computational Linguistics, Vol.27(4), pp. 521-544, 2001. [DOI:10.1162/089120101753342653]

35. [32] T. Mikolov, I. Sutskever, K. Chen, G. S Corrado and J. Dean, "Distributed represent-tations of words and phrases and their compositionality", In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Wein-berger, editors, Advances in Neural Information Processing Systems 26, pp. 3111:3119. Curran Associates, Inc, 2013.

36. [33] U. Olga, M. Poesio, C. Giuliano and K. Tymoshenko, "Disambiguation and Filtering Methods in Using Web Knowledge for Coreference Resolution," FLAIRS Conference, 2013.

37. [34] Ng. Vincent, "Shallow Semantics for Core-ference Resolution," 43rd Annual Meeting on Association for Computational Linguistics, 2017.

38. [35] V. Marc, J. Burger, J. Aberdeen, D. Connolly, and L. Hirschman, "A model-theoretic coreference scoring scheme,"Proceedings of the 6th conference on Message understanding. Association for Computational Linguistics, 1995.

39. [36] S. Wiseman, A. M. Rush, S. M. Shieber, and Jason Weston, "Learning anaphoricity and antecedent ranking features for coreference resolution", ACL-IJCNLP, pp. 1416-1426, 2015, Beijing, China. [DOI:10.3115/v1/P15-1137]

40. [37] Y. Xiaofeng, G. Zhou, J. Su, and Ch. L. Tan, "Coreference resolution using competition learning approach," 41st Annual Meeting on Association for Computational Linguistics, Volume 1, 2013.

41. [38] Y. Xiaofeng, J. Su, G. Zhou, and Ch. Lim Tan, "An NP-cluster based approach to coreference resolution," Proceedings of the 20th inter-national conference on Computational Linguis-tics, Association for Computational Linguistics, 2014.

42. [39] X. Luo, "On coreference resolution performance metrics," Proceedings of the conference on human language technology and empirical methods in natural language processing, Association for Computational Linguistics, 2005. [DOI:10.3115/1220575.1220579]

43. [40] Y. Bishan, C. Cardie, and P. Frazier, "A hierarchical distance-dependent Bayesian model for event coreference resolution," Transactions of the Association for Computational Lin-guistics, Vol.3, pp.517-528, 2015. [DOI:10.1162/tacl_a_00155]

44. [41] Y. Dian, X. Pan, B. Zhang, L. Huang, D. Lu, S. Whitehead, and H. Ji, "RPI BLENDER TAC-KBP2016 system description", In Proceedings of the Text Analysis Conference, 2016.

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.

نظر شما در مورد قالب جدید چیست؟
	خوب
	متوسط
	ضعیف

پایگاه‌های مرتبط

واژگان کلیدی

نظرسنجی