هم‌مرجع‌یابی مبتنی بر پیکره در متون فارسی

رحیمی, زینب; حسین نژاد, شادی

doi:10.29252/jsdp.17.1.79

دوره 17، شماره 1 - ( 4-1399 ) جلد 17 شماره 1 صفحات 98-79 | برگشت به فهرست نسخه ها

‎ 10.29252/jsdp.17.1.79

Mendeley

Zotero

RefWorks

Rahimi Z, HosseinNejad S. Corpus based coreference resolution for Farsi text. JSDP 2020; 17 (1) :79-98
URL: http://jsdp.rcisp.ac.ir/article-1-873-fa.html

رحیمی زینب، حسین نژاد شادی. هم‌مرجع‌یابی مبتنی بر پیکره در متون فارسی. پردازش علائم و داده‌ها. 1399; 17 (1) :79-98

URL: http://jsdp.rcisp.ac.ir/article-1-873-fa.html

هم‌مرجع‌یابی مبتنی بر پیکره در متون فارسی

زینب رحیمی^*

، شادی حسین نژاد

پژوهشگاه توسعه فناوری‌های پیشرفته خواجه نصیرالدین طوسی

چکیده: (4679 مشاهده)

مرجع‌یابی یا مرجع‌گزینی یا پیدا‌کردن واژگان هم‌مرجع در متن، یکی از وظایف مهم در پردازش زبان طبیعی است که یک بخش عملیاتی مهم در مسائلی مانند خلاصه‌سازی خودکار، پرسش و پاسخ خودکار و استخراج اطلاعات به‌شمار می‌رود. طبق تعاریف زمانی، دو واژه زمانی هم‌مرجع هستند که هر دو به موجودیت واحدی در متن یا جهان حقیقی ارجاع بدهند. تاکنون برای حل این مسأله تلاش‌های متعددی صورت گرفته است که بنابر نتایج این مطالعات، عملیات مرجع‌گزینی را می‌توان با روش‌های متفاوتی مانند روش‌های قاعده‌مند، مبتنی بر قوانین مکاشفه‌ای و روش‌های یادگیری ماشین (بانظارت یا بی‌ناظر) انجام داد. نکته قابل توجه این است که در سال‌های اخیر استفاده از پیکره‌های برچسب‌گذاری‌شده در این زمینه رواج زیادی داشته و منجر به تولید نتایج مناسبی هم شده است. با تکیه بر این موضوع، در پژوهش حاضر، یک پیکره از واژگان هم‌مرجع تولید شده که حدود یک‌میلیون واژه به‌همراه برچسب موجودیت نامدار دارد. در بخش مرجع‌گزینی تمام گروه‌های اسمی، ضمایر و موجودیت‌های نامدار برچسب‌گذاری شده‌اند و برچسب‌های موجودیت نامدار پیکره شامل هفت برچسب است. در پژوهش حاضر با استفاده از این پیکره، یک ابزار مرجع‌گزینی خودکار با استفاده از ماشین بردار پشتیبان تولید شده که دقت آن بر روی داده‌های آزمایش طلایی در حدود شصت درصد است.

واژه‌های کلیدی: هم‌مرجع یابی خودکار، مرجع‌گزینی، تحلیل مرجع ضمیر، عبارات ارجاعی

متن کامل [PDF 7959 kb] (1297 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: مقالات پردازش متن
دریافت: 1397/3/19 | پذیرش: 1398/3/11 | انتشار: 1399/4/1 | انتشار الکترونیک: 1399/4/1

فهرست منابع

1. [1] طباطبایی، ش. شکفته، ی. «سامانه پایه مرجع‌یابی گروه‌های اسمی در زبان فارسی با استفاده از قوانین ساده». اولین همایش جویشگر بومی، تهران، 1394.

2. [1] Sh. Tabatabaee and Y. Shekofteh, "The basic coreference resolution system for noun phrases in Persian language using simple rules", The first conference on national search engine, Tehran 2015.

3. [2] B. Amit and B. Baldwin, "Algorithms for scoring coreference chains", The first international conference on language resources and evaluation workshop on linguistics coreference. Vol. 1. 1998.

4. [3] B. Eric and D. Roth, "Understanding the value of features for coreference resolution," Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2008.

5. [4] B. Baldwin , M. Collins , J. Eisner , A. Ratnaparkhi , J. Rosenzweig and A. Sarkar, University of Pennsylvania: description of the University of Pennsylvania system used for MUC-6, Proceedings of the 6th conference on Message understanding, November 06-08, 1995, Columbia, Maryland. [DOI:10.3115/1072399.1072416]

6. [5] Ch. Chen and Ng.Vincent, "Combining the best of two worlds: A hybrid approach to multilingual coreference resolution," Joint Conference on EMNLP and CoNLL-Shared Task, Association for Computational Linguistics, 2012.

7. [6] N. Chinchor and S. Beth, "Message understanding conference (MUC) 6," LDC2003T13 (2003(, 2013.

8. [7] N. Chinchor, "Message Understanding Conference (MUC) 7", LDC2001T02. Web Download. Philadelphia: Linguistic Data Conso-rtium, 2001.

9. [8] C. Jacob , "A coeﬃcient of agreement for nominal scales", Educational and Psychological Measurement, vol. 20, 1960, pp.37-46. [DOI:10.1177/001316446002000104]

10. [9] G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, and R. Weischedel , "The automatic content extraction (ace) program-tasks, data, and evaluatio"n"., In LREC, vol. 2, pp. 1, 2004.

11. [10] D. Greg, D. Leo Wright Hall and D. Klein, "Decentralized Entity-Level Modeling for Coreference Resolution," ACL (1), 2013.

12. [11] F. Fallahi and M. Shamsfard, "Recognizing anaphora reference in Persian sentences," Int. J. Comput. Sci, vol. 8, pp. 324-329, pp. 2011.

13. [12] A.M. Green,"Kappa statistics for multiple raters using categorical classiﬁcations", In Proceedings of the Twenty, 1997.

14. [13] A. Haghighi and D. Klein, "Simple coreference resolution with rich syntactic and semantic features", In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing : Association for Computational Linguistics, Vol. 3, pp. 1152-1161, 2009. [DOI:10.3115/1699648.1699661]

15. [14] A. Haghighi and K. Dan, "Unsupervised coreference resolution in a nonparametric bayesian model," Annual meeting-Association for Computational Linguistics. vol. 45. No. 1. 2007.

16. [15] Sh. Hosseinnejad, Y. Shekofteh, & T. Emami Azadi, "A'laam Corpus: A Standard Corpus of Named Entity for Persian Language", Signal and Data Processing, vol.14, pp.127-142, 2017. [DOI:10.29252/jsdp.14.3.127]

17. [16] H. Lee, et al, "Joint entity and event coreference resolution across documents," Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computa-tional Linguistics, 2012.

18. [17] X. Luo, "On coreference resolution performance metrics," Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2005. [DOI:10.3115/1220575.1220579]

19. [18] N. S. Moosavi and Gh. Ghassem-Sani, "A Ranking Approach to Persian Pronoun Resolution," Advances in Computational Linguistics. Research in Computing Science 41, pp. 169-180, 2009.

20. [19] V. Ng, and C. Cardie, "Improving machine learning approaches to coreference resolution," In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 104-111, 2002. [DOI:10.3115/1073083.1073102]

21. [20] M. Nazaridoust, B. Minaei Bidgoli, S. Nazaridoust, "Co-reference Resolution in Farsi Corpora", Advance Trends in Soft Computing Studies in Fuzziness and Soft Computing, vol. 312, pp.155-162, 2014. [DOI:10.1007/978-3-319-03674-8_15]

22. [21] S. Pradhan et al, "CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes," Joint Conference on EMNLP and CoNLL-Shared Task, Association for Computational Linguistics, 2012.

23. [22] M.S. Rasooli, M. Kouhestani, and A. Moloodi, "Development of a Persian Syntactic Dependency Treebank", In The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Atlanta, USA.

24. [23] M. Seraji, B. Megyesi, J. Nivre , "Bootstrapping a Persian Dependency Treebank", Published as a Journal in Special Issue of the Linguistic Issues in Language Technology (LiLT), Heidelberg, Germany, 2012.

25. [24] M. Shamsfard, H. Fadaee, "A Hybrid Morphology-Based POS Tagger for Persian", In Proceedings of 6th Language Resources and Evaluation Conference (LREC 2008), Morocco, 2008.

26. [25] M. Stamborg, et a, "Using syntactic dependencies to solve coreferences," Joint Conference on EMNLP and CoNLL-Shared Task. Association for Computational Linguistics, 2012.

27. [26] V. Stoyanov, et al. "Reconciling ontonotes: Unrestricted coreference resolution in ontonotes with reconcile," Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, Association for Computational Linguistics, 2011.

28. [27] O. Uryupina, M. Alessandro, and Massimo Poesio. "BART goes multilingual: The UniTN/Essex submission to the CoNLL-2012 shared task," Joint Conference on EMNLP and CoNLL-Shared Task, Association for Computational Linguistics, 2012.

29. [28] Y. Versley, et al, "BART: A modular toolkit for coreference resolution," Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session. Association for Computational Linguistics, 2008. [DOI:10.3115/1564144.1564147]

30. [29] M. Vilain, et al, "A model-theoretic coreference scoring scheme," Proceedings of the 6th conference on Message understanding, Association for Computational Linguistics, 1995. [DOI:10.3115/1072399.1072405]

31. [30] S. Wiseman, A. M. Rush and S. M. Shieber, "Learning Global Features for Coreference Resolution," arXiv preprint arXiv:1604.03035, 2016. [DOI:10.18653/v1/N16-1114]

32. [31] A. salimibadr and M.Homayounpour, Phrase chunking in Persian texts . JSDP, vol. 10 (2), pp. 69-86,2014.

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.

نظر شما در مورد قالب جدید چیست؟
	خوب
	متوسط
	ضعیف

پایگاه‌های مرتبط

واژگان کلیدی

نظرسنجی