Volume 14, Issue 2 (9-2017)                   JSDP 2017, 14(2): 59-74 | Back to browse issues page

DOI: 10.18869/acadpub.jsdp.14.2.59

XML Persian Abstract Print

Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Sajadi S M B, Rashidi H, Minaei bidgoli B. A New Approach for Extracting Named Entity in Classical Arabic. JSDP. 2017; 14 (2) :59-74
URL: http://jsdp.rcisp.ac.ir/article-1-295-en.html

PHD Student Islamic Azad University Central Tehran Branch
Abstract:   (323 Views)

In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and Effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers. While most of these researches are based on Modern Standard Arabic (MSA), in this paper, we focus on Classical Arabic (CA) literature. We propose a corpus called NoorCorp with 200k labeled words for research purposes which is annotated by expert human resources manually. We also collected about 18k proper names from old Hadith books as gazetteer which is called NoorGazet. Using ensemble learning, we develop a new approach for extraction of named entities (NEs) including person, location and organization. Adaboost.M2 algorithm, as implementation of multiclass Boosting method, is applied to train the prediction model. Results show that performance of the method is better than decision tree as the base classifier. We have used tokenizing, part of speech (POS) tagging, and base phrase chunking (BPC) to overcome linguistic obstacles in Arabic. An overall F-measure value of 86.85 is obtained. Finally, the proposed approach is applied on ANERCorp as MSA corpus and we have compared the results with NoorCorp.

Full-Text [PDF 6248 kb]   (115 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2014/12/1 | Accepted: 2017/03/24 | Published: 2017/10/21 | ePublished: 2017/10/21

Add your comments about this article : Your username or Email:
Write the security code in the box

Send email to the article author

© 2015 All Rights Reserved | Signal and Data Processing

Designed & Developed by : Yektaweb