Volume 16, Issue 3 (12-2019)                   JSDP 2019, 16(3): 128-117 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Dastgheib M, koleini S, Fakhrahmad S. Design and implementation of Persian spelling detection and correction system based on Semantic. JSDP 2019; 16 (3) :128-117
URL: http://jsdp.rcisp.ac.ir/article-1-668-en.html
Ricest
Abstract:   (5051 Views)

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian programs to check spell and reduce errors in electronic texts. In this work, we review the spelling detection and correction methods, especially for the Persian language. The proposed algorithm consists of two steps. The first step is non-word error detection and correction by intelligent scoring algorithm. The second step is read-word error detection and correction.  We propose a spelling system "Perspell” for Persian non-word and real-word errors using a hybrid scoring system and optimized language model by lexicon. This scoring system uses a combination of lexical and semantic features optimized by learning dataset. The weight of these features in scoring system is also optimized by learning phase. Perspell is compared with known Persian spellchecker systems and could overcome them in precision of detection and correction. Accordingly, the proposed Persian spell-checker system can also detect and correct real-word errors. This open challenge category of spelling is a complicated and time consuming task in Persian as well as, assessing the proposed method, the F-measure metric has improved significantly (about 10%) for detecting and correcting Persian words. In the proposed method, we used Persian language model with bootstrapping and smoothing to overcome data sparseness and lack of data. The bootstrapping is developed using a Persian dictionary and further we used word sense disambiguation to select the correct related replaced word.
 

Full-Text [PDF 3141 kb]   (1525 Downloads)    
Type of Study: Applicable | Subject: Paper
Received: 2017/05/8 | Accepted: 2019/06/19 | Published: 2020/01/7 | ePublished: 2020/01/7

References
1. [1] A. Sorokin, “Spelling Correction for Morphologically Rich language: a case study of Russian,” in Proceeding of the 6th on Balto-Slavic Natural Language Processing, Valencia, Spain, pp. 45-53, 2017.
2. [2] K. Kukich, “Techniques for automatically co-rrecting words in text”, ACM Computing Surveys (CSUR), vol. 24, pp. 377–439, 1992.
3. [3] O. Kashefi, M Sharifi, and B. Minaie,” A novel string distance metric for ranking Persian res-pelling suggestions”, Natural Language En-gineering, vol. 19, pp. 259–84, 2013.
4. [4] R. Mitton, “Ordering the suggestions of a spellchecker without using context”, Natural Language Engineering, vol. 15, pp. 173–192, 2008.
5. [5] F. J. Damerau, “A technique for computer detection and correction of spelling errors”, Communications of the ACM, vol.7, pp. 171–6, 1964.
6. [6] J. C. Wu, H. W ,Chiu, J. Chang, “Integrating dictionary and Web N-grams for chinese spell checking”, Computational Linguistics and Chinese Language Processing, vol.18, pp.17–30, 2013.
7. [7] M. Janidarmian, A. Roshan Fekr, K. Radecka, Z. Zilic, “A comprehensive analysis on wearable ac-celeration sensors in human activity recognition”, Sensors. vol 17, No. 3, 2017.
8. [8] N. Gupta and M. Pratistha, “Spell Checking Techniques in NLP: A Survey”, International Journal of Advanced Research in Computer Science and Software Engineering, vol 2, Issue 12, December 2012.
9. [9] F. Ahmed and et al, “Revised N-Gram based Automatic Spelling Correction Tool to Improve Retrieval Effectiveness” [online].Available:http://-citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.3996
10. [10] D. Naber, “A Rule-Based Style and Grammar Checker”, 2003, [online].Available: http://www-.danielnaber.de/languagetool/download/style_and_grammar_checker.pdf
11. [11] R. A. Wagner and M. J. Fischer, “The string-to-string correction problem,” J. ACM, vol. 21, no. 1, pp. 168–173, 1974.
12. [12] E. Zamora, J. Pollock, “The use of trigram analysis for spelling error Detection”, Information Pro-cessing & Management, vol 17, pp. 305-316, 1981.
13. [13] K. Toutanova and R. C. Moore, “Pronunciation modeling for improved spelling correction”. In Proceedings of the 40th Annual Meeting on. Association for Computational Linguistics, pp. 144–151, 2002.
14. [14] J. Schaback and F. Li, “Multi-level feature extraction for spelling correction”, in IJCAI-2007 Workshop on Analytics for Noisy Unstructured Text Data, pp.79–86, 2007.
15. [15] R. Mitton, “Spelling checkers, spelling correctors and the misspellings of poor spellers,” Inf. Process. Manag., vol. 23, pp. 495–505, 1987.
16. [16] T. M. Miangah, “FarsiSpell: a spell-checking system for Persian using a large monolingual corpus”. Literary and Linguistic Computing, vol 29, pp. 56–73, 2014.
17. [17] L. Barar, and B. QasemiZadeh,”CloniZER Spell Checker Adaptive Language Independent Spell Checker” In AIML Conference CICC, Cairo, Egypt, pp. 19–21, 2005.
18. [18] M. S. Rasooli, O.Kahefi, and B.Minaei-Bidgoli, “Effect of Adaptive Spell Checking in Persian” in Natural Language Processing and Knowledge Engineering (NLP-KE), 7th International Conference on IEEE, 2011. pp. 161–4.
19. [19] M. Shamsfard, H.S. Jafari, and M.Ilbeygi, “STeP-1: A Set of Fundamental Tools for Persian Text” in Processing. LREC, Malta, 2010.
20. [20] O, Kashefi, M. Nasri, and K. Kanani.” Automatic Spell Checking in Persian Language”. In Supreme Council of Information and Co-mmunication Technology (SCICT), Tehran, Iran, 2010.
21. [21] H. Faili, N. Ehsan, M. Montazery and M. T. Pilehvar, “Vafa spell-checker for detecting spelling, grammatical,and real-word errors of Persian language,” Literary and Linguistic Computing, vol. 31, pp. 95-117, 2016.
22. [22] M. Shamsfard, “Challenges and open problems in Persian text processing,” Proc. LTC, vol. 11, 2011.
23. [23] P. Samanta and B. Chaudhuri, “A simple Readword Error Detection and Correction Using Local Word Bigram and Trigram,” in Twenty-Fifth Conference on Computational Linguistics and Speech Processing (ROCLING 2013), Taiwan, R.O.C, 2013. pp. 211-220.

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing