Volume 15, Issue 4 (3-2019)                   JSDP 2019, 15(4): 123-130 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

badpeima M, hourali F, hourali M. Part Of Speech Tagging of Persian Language using Fuzzy Network Model. JSDP. 2019; 15 (4) :123-130
URL: http://jsdp.rcisp.ac.ir/article-1-536-en.html
Abstract:   (1491 Views)

Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical category of the words in a sentence. Grammatical and syntactical features of words are determined based on these tags.
The function of existing tagging methods depends on the corpus. As if the educational and test data are extracted from a corpus, the methods are well-functioning, or if the number of educational data is low, especially in probabilistic methods, the accuracy level also decreases. The words used in sentences are often vague. For example, the word chr('39')Mahramichr('39') can be a noun or an adjective. Existing ambiguity can be eliminated by using neighbor words and an appropriate tagging method.
Methods in this domain are divided into several categories such as:based on memory [2], rule based methods [5], statistical [6], and neural network [7]. The precision of more of these methods is an average of 95% [1]. In the paper [13], using the TnT probabilistic tagging and smoothing and variations on the estimation of the three-words likelihood function, a tagging model has been created that has reached 96.7% in total on the Penn Treebank and NEGRA entities. [14] Using the representation of the dependency network and extensive use of lexical features, such as the conditional continuity of the sequence of words, as well as the effective use of the foreground in the linear models of linear logarithms and fine-grained modeling of the unknown words, on the Penn Treebank WSJ model, 97.24% accuracy is achieved.
The first work in Farsi that has used the word neighborhoods and the similarity distribution between them. The accuracy of the system is 57.5%. In [19], a Persian open source tagger called HunPoS was proposed. This tag uses the same TnT method based on the Hidden Markov model and a triple sequence of words, and 96.9% has reached on the chr('39')chr('39')Bi Jen Khanchr('39')chr('39') corpus.
In this paper a statistical based method is proposed for Persian POS tagging. The limitations of statistical methods are reduced by introducing a fuzzy network model, such that the model is able to estimate more reliable parameters with a small set of training data. In this method, normalization is done as a preprocessing step and then the frequency of each word is estimated as a fuzzy function with respect to the corresponding tag. Then the fuzzy network model is formed and the weight of each edge is determined by means of a neural network and a membership function. Eventually, after the construction of a fuzzy network model for a sentence, the Viterbi algorithm as s subset of Hidden Markov Model (HMM) algorithms is used to specify the most probable path in the network.
The goal of this paper is to solve a challenge of probabilistic methods when the data is low and estimation made by these models  is mistaken.
The results of testing this method on ``Bi Jen Khanchr('39')chr('39') corpus verified that the proposed method has better performance than similar methods, like hidden Markov model, when fewer training examples are available. In this experiment, several times the data is divided into two groups of training and test with different sizes ascending. On the other hand, in the initial experiments, we reduced the train data size and, in subsequent experiments, increased its size and compared with the HMM algorithm.
As shown in figure 4, the train and test set and are directly related to each other, as the error rate decreases with increasing the training set and vice versa. In tests, three criteria involving precision, recall and F1 have been used. In Table 4, the implementation of HMM models and a fuzzy network is compared with each other and the results are shown.
 

Full-Text [PDF 8386 kb]   (366 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2016/12/21 | Accepted: 2019/01/9 | Published: 2019/03/8 | ePublished: 2019/03/8

References
1. [1] M. R. Feizi Derakhshi, F. Firozi, M. Rahimi, "Comparison of Works Performed on the Persian Part-of-Speech Tagging," Computational Linguis-tics, 3rd National Conference on Computer Linguistics, Sharif University of Technology, 2014.
2. [2] M. Hosseini, "Automatic labeling system and automatic disambiguation of the components of the word for the textual form of Persian language," MA, Iran University of Science And Technology, Tehran, 2008.
3. [3] M. BijanKhan, "The Role of the Corpus in Writing a Grammar: An Introduction to a Software", Iranian Journal of Linguistics, 19(2), 2004.
4. [4] G. D. Forney, "The Viterbi algorithm," Proceedings of the IEEE, pp. 268-278, 1973. [DOI:10.1109/PROC.1973.9030]
5. [5] E. Brill, "A simple rule-based part of speech tagger", In Proceedings of the 3rd Conference on Applied Natural Language Process-ing(ANLP-92), pp. 153-155, 1992. [DOI:10.3115/974499.974526]
6. [6] K. W. Church, "A stochastic PARTS program and noun phrase parser for unrestricted text", In Proceedings of Applied Natural Language Pro-cessing, pp. 136-143, 1988. [DOI:10.3115/974235.974260]
7. [7] J. Benello, A. W. Mackie , and J. A. Anderson , "Syntactic category disambiguation with neural networks," Computer Speech and Language, vol.3, pp.203-217, 1989. [DOI:10.1016/0885-2308(89)90018-1]
8. [8] H. Hidekiyo and Y. Nishkawa, "Fuzzy network technique for technological forecast-ing", Fuzzy Sets and Systems, pp. 99-113, 1984. [DOI:10.1016/0165-0114(84)90094-0]
9. [9] H. Kawamura, "Fuzzy network for decision support systems", Fuzzy Sets and Systems, pp. 59-72, 1993. [DOI:10.1016/0165-0114(93)90322-9]
10. [10] S. Chanas and, W. Kolodziejczyk, "Maximum flow in a network with fuzzy arc capacities", Fuzzy Sets and Systems, pp. 165-173, 1982. [DOI:10.1016/0165-0114(82)90006-9]
11. [11] R. Sedgewick, "Algorithms in C," Addison-Wes-ley Publishing Company, 1990.
12. [12] H.-J. Zimmermann , "Fuzzy Set Theory and Its Applications, " Kluwer-Nijhoff Publishing, pp. 61-82, 1985. [DOI:10.1007/978-94-015-7153-1_6]
13. [13] T. Brants, "TnT - a statistical partof-speech tagger," In Proceedings of the 6th Conference on Applied Natural Language Processing, 2000, pages 224-231.
14. [14] K. Toutanova, D. Klein, Ch. D. Manning and Y. Singer, "Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network", 2003, [DOI:10.3115/1073445.1073478]
15. [15] J. Giménez, and L. Màrquez, "A general pos tagger generator based on support vector machines, " In Proceedings of the 4th Interna-tional Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal.
16. [16] H. Tseng, D. Jurafsky, and Ch. Manning. "Morphological features help POS tagg-ing of unknown words across language varieties, " Fourth SIGHAN Work-shop on Chinese Language Processing, 2005, pp. 32-39.
17. [17] P. Hal acsy, A. Kornai, and C. Oravecz, "HunPos - an open source trigram tagger, ", In Proceedings of the 45th Annual Meeting of the Association for Com-putational Linguistics, Posters Prague, Czech Republic, 2007. [DOI:10.3115/1557769.1557830]
18. [18] S. Mostafa ASSI and M. Haji Abdolhosseini, "Grammatical Tagging of a Persian Corpus," Institute for Humanities and Cultural Studies, 2000.
19. [19] S. Mojgan, "A Statistical Part-of-Speech Tagger for Persian," Department of Linguistics and Philology, NODALIDA 2011, Riga, Latvia, May 11-13, 2011.
20. [20] K. Jae-Hoon, and G. Chang Kim, "Fuzzy network model for part-of-speech tagging under small training data," Natural Language En-gineering 2.02 (1996), pp. 95-110. [DOI:10.1017/S1351324996001258]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


© 2015 All Rights Reserved | Signal and Data Processing