Volume 17, Issue 3 (11-2020)                   JSDP 2020, 17(3): 37-54 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Ahmadi T, Karshenas H, Babaali B, Alinejad B. Allophone-based acoustic modeling for Persian phoneme recognition. JSDP. 2020; 17 (3) :37-54
URL: http://jsdp.rcisp.ac.ir/article-1-903-en.html
Isfahan university
Abstract:   (268 Views)
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects of speech context, and using the context-dependent models in phoneme recognition is a method which used to compensate the negative effects of coarticulation. According to this method, if two similar phonemes in speech have different contexts, each of them constitute a separate model. In this research, a linguistic method called allophonic modeling has been used to model context effects in Persian phoneme recognition. For this purpose, in the first phase, the rules required for occurrence of various allophones of each phoneme, are extracted from Persian linguistic resources. So each phoneme is considered as a class, consisting of its various context-dependent forms named allophones. The necessary prerequisites for modeling and identifying allophones, is an allophonic corpus. Since there was no such corpus in Persian language, SMALL FARSDAT corpus has been used. This corpus is segmented and labelled manually for each sentence, word and phoneme. So the phonological and lingual context required for the realization of allophones, is implemented in this corpus. For example, the syllabification has been performed on the corpus and then, for each phoneme, its position (first, middle and end) in the word and syllable is specified using different numeric tags. In the next step, allophonic labeling has been performed by searching for each of the allophonic contexts in the corpus. These allophonic corpus is used to model and recognize the allophones of input speech. Finally, each allophone is assigned to a proper phonemic class so phoneme recognition has been done using allophones. The experimental results show a high accuracy of the proposed method in phenome recognition, indicating a significant improvement comparing with other state-of-the-art methods.
Full-Text [PDF 5608 kb]   (73 Downloads)    
Type of Study: Applicable | Subject: Paper
Received: 2018/09/28 | Accepted: 2019/05/22 | Published: 2020/12/5 | ePublished: 2020/12/5

References
1. [1] M. Bijankhan, "The phonology of optimality theory", Tehran: Samt, 2005.
2. [2] M. Bijankhan, "Phonetic system of Persian language", Tehran: Samt, 2013.
3. [3] Y. Samareh, "Phonetics of Persian language", Tehran: Academic publishing center, 1999.
4. [4] A. M. Haghshenas, "Phonetic", Tehran: Agah, 2013.
5. [5] G. Deihaim, "An introduction to General Phonetics", Tehran: National university of Iran, 159, 1979.
6. [6] S. Sepanta, "Acoustic phonetics of Persian language", Isfahan: Golha, 1998.
7. [7] B. Alinejad, F. Hosseini Balam, Fundamentals of acoustic phonetics", Isfahan: university of Isfahan, 2013.
8. [8] B. Alinejad, Fundamentals of phonology, Isfshsn: university of Isfahan, 2016.
9. [9] A. Kodr Zafaranloo Kambozia, "phonology rule-based approach", Tehran: Samt, 2013.
10. [10] G. Modarresi Ghavami, "Phonetics: The scientific study of speech", Tehran: Samt, 2011.
11. [11] M. Meshkato Dini, "The sound pattern of language (third edition)", Mashhad: Ferdowsi University of Mashhad, p. 131, 2009.
12. [12] L. Yarmohammadi, "An iIntroduction to phonetics", Tehran: university publication center, 1985.
13. [13] A. S. Mirsaeidi, "phonetic study of phonological process assimilation and dissimilation in Persian", PH.D dissertation, fgn, Isf., Isfahan., 2011.
14. [14] M. Bijankhan, "Persian allophones system in the framework of articulatory phonemics theory", Journal of the faculty of literature and humanities, winter, pp. 95-117, 2001.
15. [15] K. Zahedi, F. Fakharian, " Consonantal Assimilation in Modern Persian: A Feature Geometry Approach", Journal of researches in linguistics, autumn-winter, Issue 2, pp. 47-64, 2010.
16. [16] V. Sadeghi, "The effect of aspiration on Persian stop voicing contrast", Journal of language and linguistics, pp. 65-84, 2007.
17. [17] V.Sadeghi, "The phonetics and phonology of Persian glottal consonants", Journal of researches in linguistics, spring and summer, issue 2, pp. 49-62, 2010.
18. [18] B. Alinejad, "Persian aspiration and voicing in laryngeal phonology", Journal of researches in linguistics, spring and summer, issue 2, pp. 63-80, 2010.
19. [19] G. Modarresi Ghavami, "Neutralization of contradiction between voiced and unvoiced stop consonants in Persian language", Journal of proceeding of Allameh Tabatabaee university, issue 219, pp. 441-454, 2007.
20. [20] M. Norbakhsh, "uvular consonants in standard Persian", Journal of language research Zabanpazhuhi, issue 15, pp. 151-170, 2015.
21. [21] M. Sharifi, V. Sadeghi, "phoneme recognition algorithm design using the acoustic correlates of the phonological features", Journsl of signal and data processing, Vol. 2 (SERIAL 16), pp. 13-28, 2011.
22. [22] F. Almasganj, SA. Seyyed Salehi, M. Bijankhan, "Shenava 2: a Persian continuous speech recognition software", in the first workshop on Persian language and computer, pp. 77-82, Tehran, 2004.
23. [23] H. Sameti, M. Bahrani, "Extraction and modeling context dependent phone units for improvement of continuous speech recognition accuracy by phonemes clustering", Journal of electrical engineering and computer engineering of Iran, spring-summer, year 3, No. 1, pp. 45-51, 2005.
24. [24] T. Ahmadi, H. Karshenas, B. Alinejad, M. Naghavi Ravandi, "Automatic syllabification of Persian words based on Pulgram principles", In the fifth international conference of language studies, Iran, Allameh Tabatabaee university, 2017.
25. [25] B. Babaali, "A state-of-the-art and efficient framework for Persian speech recognition", Research center of intelligent signal processing, Vol. 13(3), pp. 51-62, 2016.
26. [26] D. Yu, L. Deng, "Automatic speech recognition, a deep learning approach". Springer, pp. 1-2, London, 2016.
27. [27] S. Karpagavalli, E. Chandra, "A Review on Automatic Speech Recognition Architecture and Approaches", International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 9(4), pp. 393-404, 2016. [DOI:10.14257/ijsip.2016.9.4.34]
28. [28] S. Sun, B. Zhang, L. Xie, Y. Zhang, "An unsupervised deep domain adaptation approach for robust speech recognition", Neurocomputing, pp. 79-87, Sep 27, 2017. [DOI:10.1016/j.neucom.2016.11.063]
29. [29] L. Toth, I. Hoffmann, G. Gosztolya, V. Vincze, G. Szatloczki, Z. Banreti, M. Pákáski, J. Kalman, "A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech", Current Alzheimer Research, vol. 15(2), pp. 130-138, Feb 1, 2018. [DOI:10.2174/1567205014666171121114930] [PMID] [PMCID]
30. [30] S. Sinha, SS. Agrawal, A. Jain, "Continuous density hidden markov model for hindi speech recognition", GSTF Journal on Computing (JoC), vol. 3(2), Jan 19, 2018. [DOI:10.7603/s40601-013-0015-z]
31. [31] CH. You, MA. Bin, "Spectral-domain speech enhancement for speech recognition", Speech Communication, pp. 30-41, Nov 1, 2017. [DOI:10.1016/j.specom.2017.08.007]
32. [32] R. Lileikytė, L. Lamel, JL. Gauvain, A. Gorin, "Conversational telephone speech recognition for Lithuanian", Computer Speech & Language, pp. 71-82, May 31, 2018. [DOI:10.1016/j.csl.2017.11.005]
33. [33] Z. Chen, J. Droppo, J. Li, W. Xiong, Z. Chen, J. Droppo, J. Li, W. Xiong, "Progressive joint modeling in unsupervised single-channel overlapped speech recognition", IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol.26(1), pp. 184-196, Jan 1, 2018. [DOI:10.1109/TASLP.2017.2765834]
34. [34] SC. Sajjan, C. Vijaya, "Continuous Speech Recognition of Kannada language using triphone modeling", In International Conference: Wireless Communications, Signal Processing and Networking (WiSPNET), Mar 23, 2016, pp. 451-455. [DOI:10.1109/WiSPNET.2016.7566174]
35. [35] A. Shaukat, H. Ali, U. Akram, "Automatic Urdu Speech Recognition using Hidden Markov Model", In International Conference: Image, Vision and Computing (ICIVC), Aug 3, 2016, pp. 135-139.
36. [36] J. Xu, J. Pan, Y. Yan, "Agglutinative language speech recognition using automatic allophone deriving", Chinese Journal of Electronics, vol.25(2), pp. 328-333, Mar 1, 2016. [DOI:10.1049/cje.2016.03.020]
37. [37] B. Baba Ali, H. Sameti. "The sharif speaker-independent large vocabulary speech recognition system", in The 2nd Workshop on information Technology & Its Disciplines (WITID 2004), Feb 24. 2004, pp. 24-26.
38. [38] H. Sameti, H. Veisi, M. Bahrani, B. Babaali, K, Hosseinzadeh, "Nevisa: a Persian continuous speech recognition system", In Advances in Computer Science and Engineering, Springer, pp. 485-492, Berlin, Heidelberg, 2008. [DOI:10.1007/978-3-540-89985-3_60]
39. [39] H. Sameti, H. Veisi, M. Bahrani, B. Babaali, K. Hosseinzadeh, "A large vocabulary continuous speech recognition system for Persian language", EURASIP Journal on Audio, Speech, and Music Processing, Dec 1, 2011. [DOI:10.1186/1687-4722-2011-426795]
40. [40] KE. Kafoori, SM. Ahadi. "Bounded cepstral marginalization of missing data for robust speech recognition", Computer Speech & Language, pp. 1-23, Mar 1, 2016. [DOI:10.1016/j.csl.2015.07.005]
41. [41] HR. Seresht, SM. Ahadi, S. Seyedin. "Spectro-temporal power spectrum features for noise robust ASR", Circuits, Systems, and Signal Processing, vol. 36(8), pp. 3222-3242, Aug 1, 2017. [DOI:10.1007/s00034-016-0434-0]
42. [42] KE. Kafoori, SM. Ahadi, "Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data", Circuits, Systems, and Signal Processing, vol.37(4), pp. 1625-1648, Apr 1, 2018. [DOI:10.1007/s00034-017-0616-4]
43. [43] SG. Firooz, F. Almasganj, Y. Shekofteh, "Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals", Computers & Electrical Engineering, pp. 215-226, 2017. [DOI:10.1016/j.compeleceng.2016.07.006]
44. [44] MM. Goodarzi, F. Almasganj, "Model-based clustered sparse imputation for noise robust speech recognition", Speech Communication, pp. 218-229, Feb 1, 2016. [DOI:10.1016/j.specom.2015.06.009]
45. [45] Y. Shekofteh, F. Almasganj, A. Daliri, "MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space", Engineering Applications of Artificial Intelligence, pp. 1-9, Sep 1, 2015. [DOI:10.1016/j.engappai.2015.05.001]
46. [46] M. Bahrani, H. Sameti, "Building statistical language models for persian continuous speech recognition systems using the peykare corpus", International Journal of Computer Processing of Languages, vol.23(01), pp. 1-20, Mar, 2011. [DOI:10.1142/S1793840611002188]
47. [47] M. Sharifi Atashgah, V. Sadeghi, "A phoneme recognition algorithm design using the acoustic correlates of the phonological features", Journsl of signal and data processing, Vol.2, (SERIAL 16), pp. 13-28, 2011.
48. [48] P. Roach, "English Phonetics and Phonology Fourth Edition: A Practical Course", Ernst Klett Sprachen, pp. 42-43, 2010.
49. [49] C. Gussenhoven, H. Jacobs, "Understanding phonology", Routledge, 2017. [DOI:10.4324/9781315267982]
50. [50] WJ. Hardcastle, J. Laver, FE. Gibbon, "The handbook of phonetic sciences", John Wiley & Sons, Feb 22, pp.316-356, 500, 783-784, 793), 2010.
51. [51] D. Recasens, "Coarticulation and sound change in Romance", John Benjamins Publishing Company, Apr 15, p. ix, 3, 2014. [DOI:10.1075/cilt.329]
52. [52] B. Kühnert, F. Nolan, "The origin of coarticulation. Coarticulation: Theory, data and techniques", pp. 7-30, 1999. [DOI:10.1017/CBO9780511486395.002]
53. [53] R. Kennedy, "Phonology: A Coursebook", Cambridge University Press, 2017. [DOI:10.1017/CBO9781107110793]
54. [54] P. Ladefoged, K. Johnson, "A course in phonetics", Nelson Education, Jan 3, p. 71, 111, 277, 2014.
55. [55] B. Heselwood, "Phonetic transcription in theory and practice", Edinburgh University Press, Oct 31, p. 151, 2013. [DOI:10.3366/edinburgh/9780748640737.001.0001]
56. [56] BS. Collins, IM. Mees, "Practical phonetics and phonology: A resource book for students", Routledge, Feb 11, p. 123, 2013. [DOI:10.4324/9780203080023]
57. [57] RM. Millar, L. Trask, "Trask's historical linguistics", Routledge, Feb 20, pp. 49-51, 2015.
58. [58] WG. Bennett, "Assimilation, dissimilation, and surface correspondence in Sundanese", Natural Language & Linguistic Theory, vol. 33(2), pp. 371-415, May 1, 2015. [DOI:10.1007/s11049-014-9268-2]
59. [59] RA. Knight, "Phonetics: A coursebook", Cambridge University Press, Jan 26, pp. 90, 103, 192-193, 2012.
60. [60] D. Jacques, "Generative and non-linear phonology", Routledge, Sep 25, pp. 298, 2014. [DOI:10.4324/9781315846903]
61. [61] C. Herff, D. Heger, A. De Pesters, D. Telaar, P. Brunner, G. Schalk, T. Schultz, "Brain-to-text: decoding spoken phrases from phone representations in the brain", Frontiers in neuroscience, pp. 217, Jun 12, 2015. [DOI:10.3389/fnins.2015.00217] [PMID] [PMCID]
62. [62] MS. Mirzaei, K. Meshgi, T. Kawahara, "Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening", Computer Speech & Language, pp. 17-36, May 1, 2018. [DOI:10.1016/j.csl.2017.11.001]
63. [63] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, Y. Bengio. "End-to-end attention-based large vocabulary speech recognition", In IEEE International Conference: Acoustics, Speech and Signal Processing (ICASSP), Mar 20, 2016, pp. 4945-4949. [DOI:10.1109/ICASSP.2016.7472618]
64. [64] AR. Mohamed, F. Seide, D. Yu, J. Droppo, A. Stoicke, G. Zweig, G. Penn, "Deep bi-directional recurrent networks over spectral windows", In IEEE Workshop: Automatic Speech Recognition and Understanding (ASRU), pp. 78-83, Dec 13, 2015. [DOI:10.1109/ASRU.2015.7404777] [PMCID]
65. [65] T. Moon, H. Choi, H. Lee, I. Song, "Rnndrop: A novel dropout for rnns in asr", In IEEE Workshop: Automatic Speech Recognition and Understanding (ASRU), pp. 65-70, Dec 13, 2015. [DOI:10.1109/ASRU.2015.7404775]
66. [66] I. Himawan, P. Motlicek, D. Imseng, S. Sridharan, "Feature mapping using far-field microphones for distant speech recognition", Speech Communication, pp. 1-9, Oct 1, 2016. [DOI:10.1016/j.specom.2016.07.003]
67. [67] S. Ravuri, "Hybrid dnn-latent structured SVM acoustic models for continuous speech recognition", In IEEE Workshop: Automatic Speech Recognition and Understanding (ASRU), pp. 37-44, Dec 13, 2015. [DOI:10.1109/ASRU.2015.7404771]
68. [68] W. Chan, N. Jaitly, Q. Le, O. Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition", In IEEE International Conference: Acoustics, Speech and Signal Processing (ICASSP), Mar 20, 2016, pp. 4960-4964. [DOI:10.1109/ICASSP.2016.7472621]
69. [69] Z. Wang, E. Vincent, R. Serizel, Y. Yan, "Rank-1 constrained multichannel Wiener filter for speech recognition in noisy environments", Computer Speech & Language, pp. 37-51, May 1, 2018. [DOI:10.1016/j.csl.2017.11.003]
70. [70] H. Barfuss, C. Huemmer, A. Schwarz, W. Kellermann, "Robust coherence-based spectral enhancement for speech recognition in adverse real- world environments", Computer Speech & Language, pp. 388-400, Nov 1, 2017. [DOI:10.1016/j.csl.2017.02.005]
71. [71] AH. Moore, PP. Parada, PA. Naylor, "Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures", Computer Speech & Language, pp. 574-84, Nov 1, 2017. [DOI:10.1016/j.csl.2016.11.003]
72. [72] A. Veiga, S. Candeias, L. Sá, F. Perdigão, "Using coarticulationrules in automatic phonetic transcription", In Proceedings of PROPOR, April, 2010.
73. [73] F. Imedjdouben, A. Houacine, "Generation of allophones for speech synthesis dedicated to the Arabic language", In First International Conference on New Technologies of Information and Communication (NTIC), 2015, pp. 1-4, Nov 8. [DOI:10.1109/NTIC.2015.7368754]
74. [74] A. Lee, T. Kawahara, K. Shikano, "Julius---an open source real-time large vocabulary recognition engine", 2001.
75. [75] "in the CU SONIC ASR system for noisy speech: The SPINE task", In IEEE International Conference: Acoustics, Speech, and Signal Processing (ICASSP'03), Vol. 1, 2003, pp. I-I.
76. [76] KF. Lee, HW. Hon, R. Reddy, "An overview of the SPHINX speech recognition system", In Readings in speech Recognition, pp. 600-610, 1990. [DOI:10.1016/B978-0-08-051584-7.50056-5]
77. [77] W. Walker, P. Lamere, P. Kwok, B. Raj, R. Singh, E. Gouvea, P. Wolf, J. Woelfel, "Sphinx-4: A flexible open source framework for speech recognition", 2004.
78. [78] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, "The Kaldi speech recognition toolkit", In IEEE workshop on automatic speech recognition and understanding (No. EPFL-CONF-192584), IEEE Signal Processing Society, 2011.
79. [79] M. Bijankhan, MJ. Sheikhzadegan, MR. Roohani, "SMALL FARSDAT-The speech database of Farsi spoken language", In Proceedings of the 5th Australian International Conference on speech science and technology, Perth, Australia, December, 1994, pp. 826-829.
80. [80] F. Almasganj, SA. Seyedsalehi, M. Bijankhan, H. Sameti, J. Sheikhzadegan, "SHENAVA-1: Persian spontaneous continuous speech recognizer", In Proceedings of the International Conference on Electrical Engineering, 2001, pp. 101-106.
81. [81] M. Caballero, A. Moreno, A. Nogueiras, "Multidialectal Spanish acoustic modeling for speech recognition", Speech Communication, vol. 51(3), pp. 217-229, 2009. [DOI:10.1016/j.specom.2008.08.003]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


© 2015 All Rights Reserved | Signal and Data Processing