Volume 14, Issue 4 (3-2018)                   JSDP 2018, 14(4): 79-96 | Back to browse issues page


XML Persian Abstract Print


Sirjan University of Technology
Abstract:   (4753 Views)

There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a dependency treebank to a constituency treebank for Persian. Our method is based on an existing method. However, we make modification to enhance its accuracy. The base algorithm constructs a constituency structure according to a set of conversion rules. Each rule maps a dependency relation to a constituency subtree. The constituency structure is built by combining these subtrees. We investigate the effects of the order in which dependency relations are processed on the output constituency structure. We show that the best order depends on the charactersitics of the target language. We also make modification in the algorithm for matching the conversion rules. To match a dependency relation to a conversion rule, we start with detailed infromation and if no match was found, we decrease the details and also change the method for matching. We also make modification in the algorithm used for combining the constituency subtrees. We use statistical data derived from a treebank to find a proper position for attaching a constituency subtree to the projection chain of the head. The expremental results show that these modifications provide an improvement of 16.48% in the accuracy of the conversion algorithm.
 
 

Full-Text [PDF 6511 kb]   (1474 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2016/02/21 | Accepted: 2017/10/25 | Published: 2018/03/13 | ePublished: 2018/03/13

References
1. [1] Soltanzadeh F, Bahrani M, Eslami M. "A Rule-Based Approach in Converting a Dependency Parse Tree into Phrase Structure Parse Tree for Persian", JSDP. vol. 12 (4), pp. 95-115, 2016.
2. [2] Dehghan M H, Faili H. "Generating the Persian Constituency Treebank in an Automatic Convert-ing Method", JSDP, vol. 13 (2), pp.121-137, 2016.
3. [3] Black, Ezra, et al. "A procedure for quantitatively comparing the syntactic coverage of English grammars." Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991. [DOI:10.3115/112405.112467]
4. [4] Bhatt, Rajesh, and Fei Xia. "Challenges in converting between treebanks: a case study from the hutb." META-RESEARCH Workshop on Advanced Treebanking. 2012.
5. [5] Collins, Michael, et al. "A statistical parser for Czech." Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 1999. [DOI:10.3115/1034678.1034754]
6. [6] Covington, Michael A. "An empirically motivated reinterpretation of Dependency Grammar." arXiv preprint cmp-lg/9404004, 1994.
7. [7] Ghayoomi, Masood. "Bootstrapping the Develop-ment of an HPSG-based Treebank for Persian." Linguistic Issues in Language Techno-logy vol. 7, no. 1, pp. 1-13. 2012.
8. [8] Ghayoomi, M., & Kuhn, J. "Converting an HPSG-based Treebank into its Parallel Dependency-based Treebank". In LREC, pp. 802-809, 2014.
9. [9] Kaplan, Ronald M. "The formal architecture of lexical-functional grammar." Formal issues in lexical-functional grammar, vol. 47, pp. 7-27, 1995.
10. [10] Klein, A., "From dependency to constituency: Automatic generation of Penn Treebank trees from LFG f-structures", M.S. Thesis , Univer-sity of Heidelberg, Germany, 2009.
11. [11] Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. "Building a large annotated corpus of English: The Penn Treebank", Computational linguistics, vol. 19, no. 2, pp 313-330, 1993. [DOI:10.21236/ADA273556]
12. [12] Pollard, Carl, and Ivan A. Sag. Head-driven phrase structure grammar. University of Chi-cago Press, 1994.
13. [13] Rasooli, M. S., Kouhestani, M., & Moloodi, A. "Development of a Persian syntactic depend-ency treebank". In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 306-314, 2013.
14. [14] Schabes, Y., Abeille, A., & Joshi, A. K. "Parsing strategies with 'lexicalized' grammars: application to tree adjoining grammars", In Proceedings of the 12th conference on Computational linguistics, Association for Computational Linguistics, vol. 2, pp. 578-583, 1988. [DOI:10.3115/991719.991757]
15. [15] SekineS. & Collins.M. J, The evalb software, 1997. Available: http://cs.nyu.edu/cs/projects/proteus/evalb. [Accessed: 01- Oct- 2017].
16. [16] Seraji, M., Megyesi, B., & Nivre, J. "Bootstrapping a Persian dependency treebank". Linguistic Issues in Language Technology, vol. 7, no. 18, pp 1-10, 2012.
17. [17] Steedman, M. The syntactic process, vol. 24. Cambridge: MIT press, 2000.
18. [18] Wang, Z., & Zong, C. "Phrase structure parsing with dependency structure", In Proceedings of the 23rd International Conference on Computa-tional Linguistics: Posters, Association for Computational Linguistics, pp. 1292-1300, August. 2010.
19. [19] Xia, F., & Palmer, M. "Converting dependency structures to phrase structures", In Proceedings of the first international conference on Human language technology research, Association for Computational Linguistics, pp. 1-5, March. 2001. [DOI:10.3115/1072133.1072147]
20. [20] Xia, F., Rambow, O., Bhatt, R., Palmer, M., & Misra Sharma, D. "Towards a multi-representa-tional treebank", LOT Occasional Series, vol. 12., pp. 159-170, 2008.

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.