An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

Baradaran Hemmati, Navid; Tabibzadeh, Omid

doi:10.29252/jsdp.16.3.60

Volume 16, Issue 3 (12-2019) JSDP 2019, 16(3): 60-49 | Back to browse issues page

‎ 10.29252/jsdp.16.3.60

Mendeley

Zotero

RefWorks

Baradaran Hemmati N, Tabibzadeh O. An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies. JSDP 2019; 16 (3) :60-49
URL: http://jsdp.rcisp.ac.ir/article-1-538-en.html

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

Navid Baradaran Hemmati ^*

, Omid Tabibzadeh

Bu-Ali Sina University, Hamadan

Abstract: (3524 Views)

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sentence in these corpora involves a series of relations introducing each word in the sentence as a dependent of another, referred to as the head.
Examination of head-dependent pairs extracted from similar contexts in the above treebanks reveals frequent, apparently systematic inconsistencies observed particularly in the cases of nominal and adjectival heads. This can be explained in terms of the failure to postulate valency structures for nouns and adjectives as well as for verbs, taking for granted that tokens receive the proper labels regardless of such an assumption.
When the notion of valency was borrowed from chemistry to refer to the number of controlled arguments, it was meant to apply only to verbs. Later developments of dependency grammar included the proposal of nominal and adjectival valency as well. The significance of the idea seems to have been underestimated, though. It has been highly improbable, therefore, for developers of dependency treebanks to design their annotation schemes otherwise.
As far as Persian is concerned, Uppsala Persian Dependency Treebank and Dependency Persian Treebank (DepPerTreeBank, the dependency version of PerTreeBank) have used the Stanford Typed Dependencies. The later version of the former treebank, Persian Universal Dependency Treebank, has used the Universal Dependencies. These are standard annotation schemes that do not recognize valency for nouns and adjectives. Furthermore, Persian Syntactic Dependency Treebank has used its own set of dependency relations, where little attention has been paid to the idea.
This paper reported the design process of a scheme for annotation of Persian dependency structure as part of an ongoing project of developing a dependency treebank for Persian. The scheme was based on a comprehensive description of Persian syntax according to a theory introduced as the Autonomous Phrases Theory. The main idea is that the significance of phrases should be appreciated in dependency analyses due to their cognitive reality, and the notion of valency is also extended beyond verbs, on which basis every dependent of whatever head type is classified as either a complement or an adjunct. Moreover, to make the resulting annotation scheme reasonably intelligible to the target audience, the latest standard available annotation scheme, Universal Dependencies (UD), was adapted to suit the requirements of the adopted framework.
The outcome was a tag set of fifty-three dependency relations, including fifteen original labels and the rest borrowed from the universal dependencies. Although it provides more detailed annotation than UD does by making finer distinctions, our scheme does not involve too many tags more than UD does, mainly because a large number of the additional relations are shared by two or three head types.

Keywords: annotation, Persian, treebank, Universal Dependencies, valency

Full-Text [PDF 3085 kb] (1368 Downloads)

Type of Study: Research | Subject: Paper
Received: 2018/06/22 | Accepted: 2019/09/4 | Published: 2020/01/7 | ePublished: 2020/01/7

References

1. [1] J.Nivre, “Treebanks”, In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook, Berlin, Germany: Mouton de Gruyter. 2008, pp. 223-255

2. [2] L.Tesnière, Éléments de Syntaxe Structurale (Elements of Structural Syntax), Paris, France: Klincksieck, 1959.

3. [3] V. Ágel and K. Fischer, Dependency grammar and valency theory. In B. Heine & H. Narrog (Eds.), The Oxford Handbook of Linguistic Analysis. Oxford, UK: Oxford University Press, 2010, pp. 223-255.

4. [4] M. Seraji, Morphosyntactic corpora and tools for Persian, (Doctoral Dissertation), Uppsala University, Uppsala, Sweden, 2015.

5. [5] M. Ghayoomi and J. Kuhn, “Converting an HPSG-based treebank into its parallel dependency-based Treebank”, In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, pp. 802-809, 2014.

6. [6] M.de Marneffe, B. MacCartney, and C. Manning, “Generating typed dependency parses from phrase structure parses”, In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy, 2006, pp. 449–454.

7. [7] M. de Marneffe, and C. D. Manning, “The Stanford Typed Dependencies representation”, In Proceedings of the COLING’08 Workshop on Cross-Framework and Cross-Domain Parser Evaluation , Manchester, UK, 2008, pp. 1–8.

8. [8] J. Nivre and et al, Universal Dependencies. Retrieved from http://www.universaldependenc-ies.org/u/dep/index.html (last accessed February 2016), 2014.

9. [9] M. Rasooli, M.Kouhestani, and A. Moloodi, “Development of a Persian syntactic dependency Treebank”, In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). Atlanta, USA, 2013.

10. [10] O.Tabibzadeh,”dasture zabâne fârsi: bar asâse nazariyeye goruhhâye xodgardân dar dasture vâbastegi (Persian Grammar: A Theory of Autonomous Phrases Based on Dependency Grammar)”, Tehran, Iran: Nashr-e-Markaz Publishing Co, 2012.

11. [11] N. Baradaran Hemmati, “mas’aleye zarfiyate esm va sefat dar deraxtbânkhâye vâbastegiye nahviye fârsi (The issue of valency for nouns and adjectives in the syntactic dependency treebanks for Persian)”, In Proceedings of the 2nd National Conference on Applied Research in Computational Linguistics , Shiraz, Iran, pp. 31-47, 2019.

12. [12] M. de Marneffe, T. Dozat, N.Silveira, K. Haverinen, F. Ginter, J.Nivre, and C. D. Manning, “Universal Stanford Dependencies: A cross-linguistic typology”, In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), Rey-kjavik, Iceland, pp. 4585–4592, 2014.

13. [13] O.Tabibzadeh and N. Baradaran Hemmati, “do now’ vâbasteye ezâfe’i dar zabâne fârsi: mozafon’elayhe esmi va mozâfon’elayhe vasfi (Two kinds of genitive dependents in Persian: Nominal genitives and attributive genitives)”, A Quarterly Journal of Persian Language and Literature (adabpažuhi), vol. 9, no. 32, pp. 151-172. 2015.

14. [14] R. Tsarfaty, “A unified morpho-syntactic scheme of Stanford Dependencies,” In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria. pp. 578–584, 2013.

15. [15] M. Rezaei Sharifabadi and P. Khosravizadeh Forushani, “barĉasbzaniye xodkâre naqšhâye ma’nâyi dar jomalâte fârsi be komake deraxthâye vâbastegi (Automatic semantic role labeling in Persian sentences using dependency treebanks)” Quarterly Journal of Signal and Data Processing, vol. 13, no. 2, pp. 27-38, 2016.

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote