Volume 14, Issue 2 (9-2017)                   JSDP 2017, 14(2): 3-24 | Back to browse issues page

DOI: 10.18869/acadpub.jsdp.14.2.3

XML Persian Abstract Print

Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Maskanati S, Keshavarz A. Online Persian Hand Writing Recognition Using Language Model and Reduction of User Writing Rules. JSDP. 2017; 14 (2) :3-24
URL: http://jsdp.rcisp.ac.ir/article-1-428-en.html

Assitant Professor Persian Gulf University
Abstract:   (387 Views)

The Joint-up, cursive form of Persian words and immense variety of its scripts, also different figures of Persian letters depending on their sitting positions in the words, have turned the Persian handwritings recognition to an intense challenge. The major obstacle of the most often recognition ways, is their inattention to sentence contexture which causes utilizing of a word with correct appearance within an incorrect sentence, when an input word is misrecognized. Sketching a solution that provides suitable analysis of sentence contexture, requires huge linguistic resources to take place as a fine representative for the chosen language to be recognized. In this article, a new method for online recognition of Persian words is presented which tries to improve recognition process by using the term contexture. In this article, the vocabularies collection of Persian language is divided into two groups. The first category is the vocabulary with all of their sub-words being supported by the database of handwritten subclasses, while these vocabulary form 68.2% of the total vocabulary, and the assumptions being scored at the recognition stage, are members of these vocabularies. The second category is the vocabulary that is not supported by the database. Obviously, if the recognition system does not support this vocabulary, it cannot recognize more than 30 percentages of the language's words. At the recognition stage, the symptoms are detected and a symptom tag is produced. Also, at this stage, using the same label, the vocabulary is also selected as the sign with the input word. (These vocabularies are chosen from those were not supported at the recognition stage). Scoring for hypotheses was done by combining recognition scores and linguistic models. The certain fact in this section is that it is impossible to calculate recognition scores due to the absence of hypothetical subheadings. Therefore, the vocabulary score being recognized in the previous steps, is used. According to the studies, it was concluded that if the word is equivalent to a member's input from a supported vocabulary, even if the result of the recognition is incorrect, in most cases the correct term is in the first four hypotheses. Usually, scores of the first few hypotheses are close to each other, and the other assumptions are far from the correct hypothesis. Since the system operates online, unnecessary computations should be avoided. Therefore, if the number of hypotheses in the recognition section are more than four hypotheses, only the first four hypotheses are calculated for the language model. To calculate the recognition score for new hypotheses, if there are fewer than four hypotheses in the recognition section, the lowest hypothesis score and otherwise the hypothesis score are considered for the recognition score of the new hypotheses. Then, as with previous assumptions, for the new hypotheses, the linguistic score is calculated, and then the final score is obtained for each hypothesis. Finally, the assumption with the highest score is considered as the system output, and the rest of the assumptions are displayed in the output to the user. Experiments show that even in the event of a mistake, the correct word is often presented as a second hypothesis in most cases, and in some cases as a third hypothesis. Also, to reduce the limits and rules that gainers compel to submit. The method demonstrated in this article includes the symptoms and morphemes framework of input handwritten are segregated and the framework of each morpheme with its symptoms is specified at first, then the symptoms of morphemes are specified and based on them a collection of words is being considered as a hypothesis. Each hypothesis is given a score by measuring the similarity to input handwritten and according to taken scores, the likely hypotheses are indicated. Then, this procedure is led to achieve hypotheses more likely by lingual models. To totalize the scores of a hypothesis, for the differences in scale of taken scores, a method of score normalization is being offered. The results demonstrate that by utilizing of a language model with an online system of handwriting recognition, a significant reduction of words recognition error rate is being achieved. In addition to error rate reduction, by taking advantages of this language model, a technique is being offered that can handle the Persian vocabulary recognition entirely. By availing the offered manner, the recognition precision at initial stage of letters level up to 95.9% and so the language model recognition up to 99.3% improved. So, using huge linguistic resources for Persian language and utilizing a language model, can improve the accuracy of recognition. For further work, reinforcement learning algorithm is suggested to adapt the algorithm for users.

Full-Text [PDF 8742 kb]   (213 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2015/09/26 | Accepted: 2016/11/6 | Published: 2017/10/21 | ePublished: 2017/10/21

Add your comments about this article : Your username or Email:
Write the security code in the box

Send email to the article author

© 2015 All Rights Reserved | Signal and Data Processing

Designed & Developed by : Yektaweb