Volume 7, Issue 1 (9-2010)                   JSDP 2010, 7(1): 77-88 | Back to browse issues page

XML Persian Abstract Print

Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Persian name entity recognition and classification. JSDP. 2010; 7 (1) :77-88
URL: http://jsdp.rcisp.ac.ir/article-1-731-en.html
Abstract:   (2007 Views)

Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in different languages, but lack of a system with admissible performance in Farsi texts is quietly sensible. In this paper, the Corpus of Research Center of Intelligent Signal Processing has been used to create a Farsi NER. In our proposed NER system, there exist three stages: preprocessing, feature extraction and classification. To prepare a data set in the preprocessing stage, by using the part of speech (POS) feature, names are extracted from text and then infinitives, time related names, counting names, and numbers are removed from data. This gives a more balanced data set for learning and classification. In the feature extraction stage, N-gram is computed as feature, and four classifiers (linear, KNN, Bayesian, Neural Network) is learned in the classification stage. Because of lack of variety in the time related names and a few number of mixture of time related names with names in the other categories, an auxiliary list is used to identifying them. The results of research show, neural network have better performance (99%) in distinct between the names of places and people. In general, KNN and linear classifiers obtain 91% success based on F-measure scale in classifying the names of places and people and general names. In classifying the time related names, using an auxiliary list, based on an F-measure scale, a 96% success was obtained.

Full-Text [PDF 3813 kb]   (431 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2010/09/22 | Accepted: 2018/02/19 | Published: 2018/02/19 | ePublished: 2018/02/19

Add your comments about this article : Your username or Email:

© 2015 All Rights Reserved | Signal and Data Processing