A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sentence in these corpora involves a series of relations introducing each word in the sentence as a dependent of another, referred to as the head.
Examination of head-dependent pairs extracted from similar contexts in the above treebanks reveals frequent, apparently systematic inconsistencies observed particularly in the cases of nominal and adjectival heads. This can be explained in terms of the failure to postulate valency structures for nouns and adjectives as well as for verbs, taking for granted that tokens receive the proper labels regardless of such an assumption.
When the notion of valency was borrowed from chemistry to refer to the number of controlled arguments, it was meant to apply only to verbs. Later developments of dependency grammar included the proposal of nominal and adjectival valency as well. The significance of the idea seems to have been underestimated, though. It has been highly improbable, therefore, for developers of dependency treebanks to design their annotation schemes otherwise.
As far as Persian is concerned, Uppsala Persian Dependency Treebank and Dependency Persian Treebank (DepPerTreeBank, the dependency version of PerTreeBank) have used the Stanford Typed Dependencies. The later version of the former treebank, Persian Universal Dependency Treebank, has used the Universal Dependencies. These are standard annotation schemes that do not recognize valency for nouns and adjectives. Furthermore, Persian Syntactic Dependency Treebank has used its own set of dependency relations, where little attention has been paid to the idea.
This paper reported the design process of a scheme for annotation of Persian dependency structure as part of an ongoing project of developing a dependency treebank for Persian. The scheme was based on a comprehensive description of Persian syntax according to a theory introduced as the Autonomous Phrases Theory. The main idea is that the significance of phrases should be appreciated in dependency analyses due to their cognitive reality, and the notion of valency is also extended beyond verbs, on which basis every dependent of whatever head type is classified as either a complement or an adjunct. Moreover, to make the resulting annotation scheme reasonably intelligible to the target audience, the latest standard available annotation scheme, Universal Dependencies (UD), was adapted to suit the requirements of the adopted framework.
The outcome was a tag set of fifty-three dependency relations, including fifteen original labels and the rest borrowed from the universal dependencies. Although it provides more detailed annotation than UD does by making finer distinctions, our scheme does not involve too many tags more than UD does, mainly because a large number of the additional relations are shared by two or three head types.
Rights and permissions | |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |