Volume 17, Issue 4 (2-2021)                   JSDP 2021, 17(4): 155-168 | Back to browse issues page


XML Persian Abstract Print


School of Mathematics, Statistics and Computer Sciences, College of Science, University of Tehran
Abstract:   (2603 Views)
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to identify a sequence of handwritten words. Generally, based on the type of text, document recognition is divided into two main categories: printed and handwritten. Due to the limited number of fonts relative to the diversity of handwriting of different writers, it is much easier to recognize printed texts than handwritten text; thus, the technology of recognizing printed texts has matured and has been marketed in the form of a product. Handwritting recognition task is usually done in two ways: online and offline; offline handwriting recognition involves the automated translation of text in image format to letters that can be used in computer and text-processing applications. Most of the research in the field of handwriting recognition has been conducted on Latin script, and a variety of tools and resources have been gathered for this script. This article focuses on the application of the latest methods in the field of speech recognition for the recognition of Arabic handwriting. The task of handwritten text modeling and recognizing is very similar to the task of speech modeling and recognition. For this reason, it is possible to apply the approaches used for the speech recognition with a slight change for the handwriting recognition. With the expansion of HMM-DNN hybrid approaches and the use of sequential objective functions such as MMI, significant improvements have been made in the accuracy of speech recognition system.  This paper presents a pipeline for the offline Arabic handwritten text recognition using the open source KALDI toolkit, which is very well-known in the community of speech recognition, as well as the use of the latest hybrid models presented in it and data augmentation techniques. This research has been conducted on the Arabic KHATT database, which achieved 7.32% absolute reduction in word recognition error (WER) rate.
Full-Text [PDF 4346 kb]   (515 Downloads)    
Type of Study: Applicable | Subject: Paper
Received: 2019/02/17 | Accepted: 2019/09/2 | Published: 2021/02/22 | ePublished: 2021/02/22

References
1. [1] R. Ahmad, S. Naz, M. Z. Afzal, S. F. Rashid, M. Liwicki, A. Dengel, "KHATT: A Deep Learning Benchmark on Arabic Script.", In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 10-14, 2017. [DOI:10.1109/ICDAR.2017.358]
2. [2] R. Al-Hajj, C. Mokbel, C. Mokbel, L. Likforman-Sulem and L. Likforman-Sulem, "Combination of HMM-Based Classifiers for the Recognition of Arabic Handwritten Words", Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007. [DOI:10.1109/ICDAR.2007.4377057]
3. [3] J. AlKhateeb, J. Ren, J. Jiang and H. Al-Muhtaseb, "Offline handwritten Arabic cursive text recognition using Hidden Markov Models and re-ranking", Pattern Recognition Letters, vol. 32, no. 8, pp. 1081-1088, 2011. [DOI:10.1016/j.patrec.2011.02.006]
4. [4] L. Bahl, P. Brown, P. de Souza and R. Mercer, "Maximum mutual information estimation of hidden Markov model parameters for speech recognition", in International Conference on Acoustics, Speech, and Signal Processing, vol. 11, pp. 49-52, 1986.
5. [5] A. M. Bidgoli, M. Sarhadi, "IAUT/PHCN: Islamic Azad University of Tehran/Persian handwritten city names, a very large database of handwritten Persian word.'', 11th International Conference on Frontiers in Handwriting Recognition, pp. 192-197, 2008.
6. [6] A. Broumandnia, J. Shanbehzadeh and M. Rezakhah Varnoosfaderani, "Persian/arabic handwritten word recognition using M-band packet wavelet transform", Image and Vision Computing, vol. 26, no. 6, pp. 829-842, 2008. [DOI:10.1016/j.imavis.2007.09.004]
7. [7] G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large- vocabulary speech recognition," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 30-42, 2012. [DOI:10.1109/TASL.2011.2134090]
8. [8] M. Dehghan, K. Faez, M. Ahmadi and M. Shridhar, "Unconstrained Farsi handwritten word recognition using fuzzy vector quantization and hidden Markov models", Pattern Recognition Letters, vol. 22, no. 2, pp. 209-214, 2001. [DOI:10.1016/S0167-8655(00)00090-8]
9. [9] M. Dehghan, K. Faez, M. Ahmadi, M. Shridhar, "Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM," Pattern Recognition, vol. 34, no. 5, pp. 1057-1065, 2001. [DOI:10.1016/S0031-3203(00)00051-0]
10. [10] R. Duda and P. Hart, "Use of the Hough transformation to detect lines and curves in pictures", Communications of the ACM, vol. 15, no. 1, pp. 11-15, 1972. [DOI:10.1145/361237.361242]
11. [11] A. Elbaati, H. Boubaker, M. Kherallah, A. Ennaji, H. Abed and A. Alimi, "Arabic Handwriting Recognition Using Restored Stroke Chronology", 2009 10th International Conference on Document Analysis and Recognition, 2009. [DOI:10.1109/ICDAR.2009.262]
12. [12] V. Goel and W. Byrne, "Minimum Bayes-risk automatic speech recognition", Computer Speech & Language, vol. 14, no. 2, pp. 115-135, 2000. [DOI:10.1006/csla.2000.0138]
13. [13] A. Graves, S. Fernández, F. Gomez and J. Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," In Proceedings of the 23rd international conference on Machine learning, pp. 369-376, 2006. [DOI:10.1145/1143844.1143891]
14. [14] H. Hadian, H. Sameti, D. Povey and S. Khudanpur, "Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 11, pp. 1949-1961, 2018. [DOI:10.1109/TASLP.2018.2848701]
15. [15] P. Haghighi, N. Nobile, C. He and C. Suen, "A New Large-Scale Multi-purpose Handwritten Farsi Database", Lecture Notes in Computer Science, pp. 278-286, 2009. [DOI:10.1007/978-3-642-02611-9_28]
16. [16] M. Hamdani, A. Mousa and H. Ney, "Open Vocabulary Arabic Handwriting Recognition Using Morphological Decomposition", 2013 12th International Conference on Document Analysis and Recognition, 2013. [DOI:10.1109/ICDAR.2013.63]
17. [17] S. K. Jemni, Y. Kessentini, S. Kanoun, J. Ogier, "Offline Arabic Handwriting Recognition Using BLSTMs Combination.", In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 31-36, 2018. [DOI:10.1109/DAS.2018.54]
18. [18] S. Khorashadizadeh, A. Latif, "Arabic/Farsi Handwritten Digit Recognition usin Histogra of Oriented Gradient and Chain Code Histogram", International Arab Journal of Information Technology (IAJIT), vol. 13, no. 4, 2016.
19. [19] H. Khosravi, E. Kabir, "Introducing a very large dataset of handwritten Farsi digits and a study on their varieties.", Pattern recognition letters, vol. 28, no. 10, pp. 1133-1141. 2007. [DOI:10.1016/j.patrec.2006.12.022]
20. [20] D. Lee, S. Ismael, S. Grimes, D. Doermann, S. Strassel, Z. Song, "MADCAT Phase 1 Training Set", LDC2012T15. DVD. Philadelphia: Linguistic Data Consortium, 2012.
21. [21] S. A. Mahmoud, I. Ahmad, M. Alshayeb, W. G. Al-Khatib, M. T. Parvez, G. A. Fink, V. Margner, H. E. Abed, "KHATT: Arabic offline handwritten text database," In 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR 2012), pp. 449-454, 2012. [DOI:10.1109/ICFHR.2012.224] [PMCID]
22. [22] S. Mozaffari, H. E. Abed, V. Märgner, K. Faez, A. Amirshahi. "IfN/Farsi-Database: a database of Farsi handwritten city names." In International Conference on Frontiers in Handwriting Recognition. 2008.
23. [23] S. Mozaffari, K. Faez, F. Faradji, M. Ziaratban, S. M. Golzan, "A comprehensive isolated Farsi/Arabic character database for handwritten OCR research," In Tenth International Workshop on Frontiers in Handwriting Recognition, Suvisoft, 2006.
24. [24] S. Mozaffari, K. Faez and M. Ziaratban, "Structural decomposition and statistical description of Farsi/Arabic handwritten numeric characters", Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005. [DOI:10.1109/ICDAR.2005.221]
25. [25] M. Pechwitz, S. S. Maddouri, V. Märgner, N. Ellouze, H. Amiri. "IFN/ENIT-database of handwritten Arabic words," In Proc. of CIFED, vol. 2, pp. 127-136. 2002.
26. [26] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, K. Vesely, "The Kaldi speech recognition toolkit," In IEEE 2011 workshop on automatic speech recognition and understanding, IEEE Signal Processing Society, 2011.
27. [27] D. Povey, V. Peddinti, D. Galvez, P. Ghahrmani, V. Manohar, X. Na, Y. Wang, and S. Khudanpur, "Purely sequence-trained neural networks for asr based on lattice-free mmi," in Interspeech, 2016. [DOI:10.21437/Interspeech.2016-595]
28. [28] D. Povey and P. C. Woodland, "Minimum phone error and i-smoothing for improved discriminative train- ing," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP, vol. 1. IEEE, pp. I-105, 2002. [DOI:10.1109/ICASSP.2002.1005687]
29. [29] D. Rybach, S. Hahn, P. Lehnen, D. Nolden, M. Sundermeyer, Z. Tüske, S. Wiesler, R. Schlüter, H Ney, "Rasr-the rwth aachen university open source speech recognition toolkit," In Proc. IEEE Automatic Speech Recognition and Understanding Workshop. 2011.
30. [30] R. Sabzi, Z. Fotoohinya, A. Khalili, S. Golzari, Z. Salkhorde, S. Behravesh, S. Akbarpour, "Recognizing Persian handwritten words using deep convolutional networks," in Artificial Intelligence and Signal Processing Conference (AISP), pp. 85-90, 2017. [DOI:10.1109/AISP.2017.8324114]
31. [31] J. Sadri, C. Y. Suen, and T. D. Bui, "Application of support vector machines for recognition of handwritten Arabic/Persian digits," In Proceedings of Second Iranian Conference on Machine Vision and Image Processing, vol. 1, pp. 300-307. 2003.
32. [32] J. Sadri, M. R. Yeganehzad, J. Saghi, "A novel comprehensive database for offline Persian handwriting recognition.", Pattern Recognition, vol. 60, pp. 378-393, 2016. [DOI:10.1016/j.patcog.2016.03.024]
33. [33] R. Safabaksh, A. Ghanbarian and G. Ghiasi, "HaFT: A handwritten Farsi text database", 2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP), 2013. [DOI:10.1109/IranianMVIP.2013.6779956]
34. [34] H. Sajedi, "Handwriting recognition of digits, signs, and numerical strings in Persian", Computers & Electrical Engineering, vol. 49, pp. 52-65, 2016. [DOI:10.1016/j.compeleceng.2015.11.030]
35. [35] H. Sak, O. Vinyals, G. Heigold, A. Senior, E. McDermott, R. Monga, and M. Mao, "Sequence discriminative distributed training of long short-term memory recurrent neural networks," in Interspeech, 2014.
36. [36] H. Soltanzadeh, M Rahmati, "Recognition of Persian handwritten digits using image profiles of multiple orientations," Pattern Recognition Letters, vol. 25, no. 14, pp. 1569-1576, 2004. [DOI:10.1016/j.patrec.2004.05.014]
37. [37] F. Stahlberg and S. Vogel, "Detecting dense foreground stripes in Arabic handwriting for accurate baseline positioning", 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015. [DOI:10.1109/ICDAR.2015.7333784]
38. [38] F. Stahlberg and S. Vogel, "The QCRI Recognition System for Handwritten Arabic", Image Analysis and Processing, pp. 276-286, 2015. [DOI:10.1007/978-3-319-23234-8_26]
39. [39] P. Voigtlaender, P. Doetsch, S. Wiesler, R. Schlüter, and H. Ney, "Sequence-discriminative training of re- current neural networks," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 2100-2104, 2015. [DOI:10.1109/ICASSP.2015.7178341]
40. [40] K. Veselỳ, A. Ghoshal, L. Burget, and D. Povey, "Sequence-discriminative training of deep neural net- works," in INTERSPEECH, pp. 2345-2349, 2013.
41. [41] S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK Book (for version 3.4). Cambridge Univ. Eng. Dept., 2009.
42. [42] M. Ziaratban, K. Faez and F. Bagheri, "FHT: An Unconstraint Farsi Handwritten Text Database", 2009 10th International Conference on Document Analysis and Recognition, pp. 281-285, 2009. [DOI:10.1109/ICDAR.2009.56]
43. [43] E. Bayesteh Tashk, A. Ahmadifard and H. khosravi, ''A two step method for offline handwritten Farsi word recognition using adaptive division of gradient image'', JSDP, Vol.12 (3), pp.15-29, 2015.

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.