Volume 17, Issue 2 (9-2020)                   JSDP 2020, 17(2): 138-121 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

sahlani H, Hourali M, Minaei-Bidgoli B. Corefrence resolution with deep learning in the Persian Labnguage. JSDP 2020; 17 (2) :138-121
URL: http://jsdp.rcisp.ac.ir/article-1-888-en.html
Malek Ashtar University of Technology
Abstract:   (2982 Views)
Coreference resolution is an advanced issue in natural language processing. Nowadays, due to the extension of social networks, TV channels, news agencies, the Internet, etc. in human life, reading all the contents, analyzing them, and finding a relation between them require time and cost.
In the present era, text analysis is performed using various natural language processing techniques, one of the challenges in this field is the low accuracy in detecting name entities' reference, which detection process has been named as coreference resolution. Coreference resolution is finding all expressions that refer to a name entity, and two expressions are coreference together when these expressions located in the same coreference cluster.
     Coreference resolution could be used in many natural language processing tasks such as question answering, text summarization, machine translation, information extraction, etc.
Coreference resolution methods are into two main categories; machine learning and rule-based approaches. In the rule-based approaches for detecting coreferences, a set of rich rule ordinary which written by a specialist is execued. These methods are quick, but these are language-dependent and necessary written to each language firstly again by a specialist. The machine learning method divides into supervised and unsupervised methods, in a supervised approach, it is require to have data labeled by a specialist.
Coreference resolution included three main phases: named entities recognition, features extraction of name entities, and analyzes the coreferences, in which the primary phase is feature extraction.
After corpus creation, name entities should be recognized in the corpus. This step depends on a corpus, in some corpora entities named as golden data, in this paper, we used RCDAT corpus, which determined name entities itself.
After the name entities recognition phase, the mention pairs are determined, and the features are extracted. The proposed method uses two categories of the features: the first is word embedding vector, the second is handcrafted features, which are the distance between the mentions, head matching, gender matching, etc.
This paper used a deep neural network to train the features extracted, in the analyze coreferences phase a Feed Forward Neural Network (FFNN) is trained by the candidate mention pairs (extracted features from them) and their labels (coreference / non-coreference or 1/0) so that the trained FFNN assigns a probability (between 0 and 1) to any given mention pair. Then used the graph technique with a threshold level to determine different or compatible name entities in the coreference resolution cluster.  This step creates the graph by using the extracted mention pairs from the previous step. In this graph, nodes are the mention pairs that are clustered by using the agglomerative hierarchical clustering algorithm inorder to locate similar mention pairs in a group. The resulting clusters are considered as coreference resolution chains.
In this paper, RCDAT Persian language corpus is used for training the proposed coreference resolution approach and for testing the Uppsala Persian language dataset which is used and in the calculation of the accurate of system, different tools have been taken for features extraction which each of them effects on the accuracy of the whole system. The corpora, tools, and methods used in the system are standard. They are quite comparable to the ACE and Ontonotes corpora and tools used at the same time in the coreference resolution algorithm.  The results of the improvements proposed method (F1 = 62.09) is expressed in the text of the paper.
Full-Text [PDF 5417 kb]   (989 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2018/08/21 | Accepted: 2019/09/2 | Published: 2020/09/14 | ePublished: 2020/09/14

References
1. [1] Z. Rahimi, S. HosseinNejad "Corpus based coreference resolution for Farsi text", JSDP, vol. 17 (1), pp. 79-98, 2020.
2. [2] Y. Shekofteh, T. Emami Azadi, "A'laam Corpus: A Standard Corpus of Named Entity for Persian Language", JSDP, Vol. 14 (3), pp.127-142, 2017. [3] P. S. Mortazavi and M.Shamsfard "Recognition of named entities in Persian texts," in 15-th annual conference of computer society of Iran, Tehran, 2009. [DOI:10.29252/jsdp.14.3.127]
3. [4] A. AleAhmad, H. Amiri, E. Darrudi, M. Rahgozar, and F. Oroumchian, "Hamshahri: A Standard Persian text collection", Knowledge-Based Systems, Vol. 22(5), pp.382-387, 2009. [DOI:10.1016/j.knosys.2009.05.002]
4. [5] A. Rahman, Ng. Vincent, "Coreference resolution with world knowledge," 49th Annual Meeting of the Association for Computational Linguistics, Vol. 1, 2013.
5. [6] B. Amit, B. Breck, "Algorithms for scoring coreference chains", In Proceedings of the LREC Workshop on Linguistic Coreference, pp. 563-566, 1998.
6. [7] M. Bijankhan, J.Sheykhzadegan, M. Bahrani, and M.Ghayoomi, "Lessons from Building a Persian Written Corpus: Peykare", Language Resources and Evaluation, Vol. 45(2), pp.143-164, 2011. [DOI:10.1007/s10579-010-9132-x]
7. [8] Ch.Prafulla, Ch. Kumar and R. Huang, "Event coreference resolution by iteratively unfolding inter-dependencies among events", In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2124-2133, 2017.
8. [9] C. Kevin and Ch. D. Manning, "Deep Reinforcement Learning for Mention-Ranking Coreference Models," In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2256-2262, 2016.
9. [10] C. Kevin and Ch. D. Manning, "Improving Coreference Resolution by Learning Entity-Level Distributed Representations," In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics Vol.1, pp. 643-653. 2016.
10. [11] C. Nicolae, G. Nicolae, "Bestcut: A graph algorithm for coreference resolution," conference on empirical methods in natural language processing, 2014.
11. [12] D. Pascal and B. Jason, "Specialized models and ranking for coreference resolution," In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 660-669, 2008.
12. [13] D. Chase, L. Chan, H. Peng, H. Wu, Sh. Upadhyay, N. Gupta, C. Tsai, M. Sammons, and D. Roth, "UI CCG TAC-KBP2017 submissions: Entity discovery and linking, and event nugget detection and co-reference," In Proceedings of the Text Analysis Conference, 2017.
13. [14] A. Haghighi, and D. Klein, "Simple coreference resolution with rich syntactic and semantic features," In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Vol.3, pp. 1152-1161, Association for Computational Linguistics, 2009. [DOI:10.3115/1699648.1699661]
14. [15] Lee. Heeyoung, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky, "Deterministic coreference resolution based on entity-centric, precision-ranked rules". Computational Linguistics, 2013. [DOI:10.1162/COLI_a_00152]
15. [16] J.Heng and R. Grishman, "Knowledge base population: Successful approaches and challenges", In Proceedings of the 49th Annual Meeting of the Association for ComProceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) putational Linguistics: Human Language Technologies, pp. 1148-1158, 2011.
16. [17] J. Shanshan, Y. Li, T. Qin, Q. Meng, and B. Dong, "SRCB entity discovery and linking (EDL) and event nugget systems for TAC 2017", In Proceedings of the Text Analysis Conference, 2017.
17. [18] P. Haoruo, Y. Song, and D. Roth, "Event detection and co-reference with minimal supervision", In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 392-402, 2016.
18. [19] P. S. Paolo, S. Michael, "Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution," main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2014.
19. [20] H. Poon, P. Domingos, "Joint unsupervised coreference resolution with markov logic", In: Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 650-659, 2008. [DOI:10.3115/1613715.1613796]
20. [21] L. Heeyoung, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky, "Deterministic coreference resolution based on entity-centric, precision-ranked rules" Computational Linguistics, Vol. 39(4), pp.885- 916, 2013. [DOI:10.1162/COLI_a_00152]
21. [22] L. Zhengzhong, J. Araki, E. Hovy, and T. Mitamura, "Supervised within-document event coreference using information pro-pagation", In Proceedings of the Ninth Lan-guage Resources and Evaluation Conference, pp. 4539- 4544, 2014.
22. [23] L. Jing and V. Ng, "Joint learning for event coreference resolution", In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol.1, pp. 90-101, 2017.
23. [24] L. Jing and V. Ng, "Learning antecedent structures for event coreference resolution", In Proceedings of the 16th IEEE International Conference on Machine Learning and
24. Applications, pp. 113-118, 2017. [DOI:10.1002/cc.20233]
25. [25] L. Jing and V. Ng, "UTD's event nugget detection and coreference system at KBP 2017", In Proceedings of the Text Analysis Conference, 2017.
26. [26] L. Xiaoqiang, "On coreference resolution performance metrics" In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 25-32, 2005.
27. [27] A. McCallum, B.Wellner, "Conditional models of identity uncertainty with application to noun coreference", In: Advances in neural info-rmation processing systems, pp. 905-912, 2005.
28. [28] V. Ng, "Supervised noun phrase coreference research", The first fifteen years. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, 2010, pp. 1396-1411.
29. [29] M. Rasooli, M. Kouhestani, and A. Moloodi, "Development of a Persian Syntactic Dependency Treebank", In The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Atlanta, USA, 2013, pp. 306-314.
30. [30] M. Seraji, B.Megyesi, J.Nivre, "Bootstrapping a Persian Dependency Treebank", Published as a Journal in Special Issue of the Linguistic Issues in Language Technology (LiLT), Heidelberg, Germany, 2012.
31. [31] S. W. Meng, H. T. Ng, D. Chung, Y. Lim, "A machine learning approach to coreference resolution of noun phrases", Computational Linguistics, Vol.27(4), pp. 521-544, 2001. [DOI:10.1162/089120101753342653]
32. [32] T. Mikolov, I. Sutskever, K. Chen, G. S Corrado and J. Dean, "Distributed represent-tations of words and phrases and their compositionality", In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Wein-berger, editors, Advances in Neural Information Processing Systems 26, pp. 3111:3119. Curran Associates, Inc, 2013.
33. [33] U. Olga, M. Poesio, C. Giuliano and K. Tymoshenko, "Disambiguation and Filtering Methods in Using Web Knowledge for Coreference Resolution," FLAIRS Conference, 2013.
34. [34] Ng. Vincent, "Shallow Semantics for Core-ference Resolution," 43rd Annual Meeting on Association for Computational Linguistics, 2017.
35. [35] V. Marc, J. Burger, J. Aberdeen, D. Connolly, and L. Hirschman, "A model-theoretic coreference scoring scheme,"Proceedings of the 6th conference on Message understanding. Association for Computational Linguistics, 1995.
36. [36] S. Wiseman, A. M. Rush, S. M. Shieber, and Jason Weston, "Learning anaphoricity and antecedent ranking features for coreference resolution", ACL-IJCNLP, pp. 1416-1426, 2015, Beijing, China. [DOI:10.3115/v1/P15-1137]
37. [37] Y. Xiaofeng, G. Zhou, J. Su, and Ch. L. Tan, "Coreference resolution using competition learning approach," 41st Annual Meeting on Association for Computational Linguistics, Volume 1, 2013.
38. [38] Y. Xiaofeng, J. Su, G. Zhou, and Ch. Lim Tan, "An NP-cluster based approach to coreference resolution," Proceedings of the 20th inter-national conference on Computational Linguis-tics, Association for Computational Linguistics, 2014.
39. [39] X. Luo, "On coreference resolution performance metrics," Proceedings of the conference on human language technology and empirical methods in natural language processing, Association for Computational Linguistics, 2005. [DOI:10.3115/1220575.1220579]
40. [40] Y. Bishan, C. Cardie, and P. Frazier, "A hierarchical distance-dependent Bayesian model for event coreference resolution," Transactions of the Association for Computational Lin-guistics, Vol.3, pp.517-528, 2015. [DOI:10.1162/tacl_a_00155]
41. [41] Y. Dian, X. Pan, B. Zhang, L. Huang, D. Lu, S. Whitehead, and H. Ji, "RPI BLENDER TAC-KBP2016 system description", In Proceedings of the Text Analysis Conference, 2016.

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing