1. [1] ج. شیخ زادگان ،م. بیژن خان، "دادگان گفتاری زبان فارسی"، دومین کارگاه پژوهشی زبان فارسی و رایانه، 1385، ص 261-247.
2. [2] ب. باباعلی، "پایه گذاری بستری نو و کارآمد در حوزه بازشناسی گفتار فارسی"، پردازش علائم و دادهها، دوره 13، ش. 3، ص. 1-13، 1394.
3. [3] س. ز. سیدصالحی و س.ع. سیدصالحی، "بهبود مدل تفکیککننده منیفلدهای غیرخطی بهمنظور بازشناسی چهره با یک تصویر از هر فرد"، پردازش علائم و دادهها، دوره 12، ش. 1، ص. 3-16، 1394.
4. [4] ز. انصاری و ع. سید صالحی، "معرفی شبکه های عصبی پیمانه ای عمیق با ساختار فضایی-زمانی دوگانه جهت بهبود بازشناسی گفتار پیوسته فارسی"، پردازش علائم و دادهها، دوره 13، ش. 1، ص. 39-56، 1395.
5. [5] س. ز. سیدصالحی و س. ع. سیدصالحی، "روش پیشتعلیم سریع بر مبنای کمینهسازی خطا برای همگرائی یادگیری شبکههای عصبی با ساختار عمیق"، پردازش علائم و دادهها، دوره 10، ش. 1، ص. 13-26، 1392.
6. [1] Bi Jen Khan, J. Sheykhzadegan, "Persian speech dataset", in Machine Translation in Persian, 2006, pp. 261-247.
7. [2] B. BabaAli, "A state-of-the-art and efficient framework for Persian speech recognition", Signal and Data Processing, Vol. 13, No. 3, pp. 1-13, 2015.
8. [3] S. Z. Seyyedsalehi, and A. Seyyedsalehi, "Improving the nonlinear manifold separator model to the face recognition by a single image of per person." Signal and Data Processing, Vol. 12, No.1, pp. 3-16, 2015.
9. [4] Z. Ansari, and A. Seyyedsalehi, "Deep Modular Neural Networks with Double Spatio-temporal Association Structure for Persian Continuous Speech Recognition." Signal and Data Processing, Vol. 13, No.1, pp. 39-56, 2016.
10. [5] S. Z. Seyyedsalehi, and A. Seyyedsalehi, "A new fast pre training method for training of deep neural network." Signal and Data Processing, Vol. 10, No.1, pp. 13-26, 2013.
11. [6] Y. Hifny and S. Renals, "Speech recognition using augmented conditional random fields," IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 2, pp. 354–365, 2009. [
DOI:10.1109/TASL.2008.2010286]
12. [7] K. H. Davis, R. Biddulph, and S. Balashek, "Automatic Recognition of Spoken Digits," The Journal of the Acoustical Society of America, vol. 24, no. 6, p. 637–642, 1952.
13. [8] L. Rabiner and B. Juang, Fundamentals of Speech Recognition: Prentice Hall, vol. 22. 1993.
14. [9] R. P. Lippmann, "Speech recognition by machines and humans," Speech Communication, vol. 22, no. 7, pp. 1–15, 1997. [
DOI:10.1016/S0167-6393(97)00021-6]
15. [10] O. Scharenborg, "Reaching over the gap: A review of efforts to link human and automatic speech recognition research," Speech Communication, vol. 49, no. 5, pp. 336–347, 2007. [
DOI:10.1016/j.specom.2007.01.009]
16. [11] M. Ostendorf, "Moving Beyond the 'Beads-on-a-String' Model of Speech," in IEEE Automatic Speech Recognition and Understanding Workshop, 1999, pp. 79–83.
17. [12] H. Bourlard, H. Hermansky, and N. Morgan, "Towards increasing speech recognition error rates," Speech Communication, vol. 18, no. 3, pp. 205–231, 1996. [
DOI:10.1016/0167-6393(96)00003-9]
18. [13] N. Morgan, Q. Zhu, and A. Stolcke, "Pushing the envelope-aside," Signal Processing Magazine, vol. 22, no. 5, pp. 81–88, 2005. [
DOI:10.1109/MSP.2005.1511826]
19. [14] C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Computer Speech & Language, vol. 9, no. 2, pp. 171–185, 1995. [
DOI:10.1006/csla.1995.0010]
20. [15] L. Lee and R. C. Rose, "Speaker normalization using efficient frequency warping procedures," in IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996, vol. 1, pp. 356–1996. [
DOI:10.1109/ICASSP.1996.541105]
21. [16] L. Welling, S. Kanthak and H. Ney, "Improved methods for vocal tract normalization," in IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999, vol. 2, pp. 761–764. [
DOI:10.1109/ICASSP.1999.759780]
22. [17] D. Povey. Discriminative Training for Large Vocabulary Speech Recognition. PhD thesis, Cambridge University, 2003.
23. [18] S. F. Chen and J. Goodman, "An empirical study of smoothing techniques for language modeling," in Proceedings of the 34th annual meeting on Association for Computational Linguistics, 1996, pp. 310–318. [
DOI:10.3115/981863.981904] [
PMID]
24. [19] G. E. Dahl, D. Yu, L. Deng and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 1, pp. 30–42, 2012. [
DOI:10.1109/TASL.2011.2134090]
25. [20] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. [
DOI:10.1109/MSP.2012.2205597]
26. [21] A. R. Mohamed, G. Hinton and G. Penn, "Understanding how deep belief networks perform acoustic modelling," in IEEE Interna-tional Conference on Acoustics, Speech and Signal Processing, 2012, pp. 4273–4276. [
DOI:10.1109/ICASSP.2012.6288863]
27. [22] R. Salakhutdinov and G. Hinton, "An Efficient Learning Procedure for Deep Boltzmann Machines," Neural Computation, vol. 24, no. 8, pp. 1967–2006, 2012. [
DOI:10.1162/NECO_a_00311] [
PMID]
28. [23] R. Salakhutdinov, "Learning deep generative models," PHD thesis, Toronto, Ont., Canada, 2009.
29. [24] G. E. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Computation, vol. 14, no. 8, pp. 1771–1800, 2002. [
DOI:10.1162/089976602760128018] [
PMID]
30. [25] G. E. Hinton, S. Osindero, and Y. W. Teh, "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006. [
DOI:10.1162/neco.2006.18.7.1527] [
PMID]
31. [26] P. Ramesh and J. G. Wilpon, "Modeling state durations in hidden Markov models for automatic speech recognition," in IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992, vol. 1, pp. 381–384. [
DOI:10.1109/ICASSP.1992.225892]
32. [27] P. N. Justine T. Kao, Geoffrey Zweig, "Discriminative duration modeling for speech recognition with segmental conditional random fields," in ICASSP, 2011. PP. 4476-4479.
33. [28] S. Z. Yu, "Hidden semi-Markov models," Artificial Intelligence, vol. 174, no. 2. pp. 215–243, 2010. [
DOI:10.1016/j.artint.2009.11.011]
34. [29] S. J. Rennie, P. Fousek, and P. L. Dognin, "factorial hidden restricted boltzmann machines for noise robust speech recognition," in IEEE International Conference on Acoustics, Speech and Signal Processing, 2012, pp. 4297–4300. [
DOI:10.1109/ICASSP.2012.6288869]
35. [30] J. Huang and B. Kingsbury, "Audio-visual deep learning for noise robust speech recognition," in IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 7596–7599. [
DOI:10.1109/ICASSP.2013.6639140]
36. [31] A. Maas and Q. Le, "Recurrent Neural Networks for Noise Reduction in Robust ASR.," in Interspeech, 2012, pp. 3–6.
37. [32] H. Bourlard and N. Morgan, "Continuous speech recognition by connectionist statistical methods," IEEE Transactions on Neural Net-works, vol. 4, no. 6, pp. 893–909, 1993. [
DOI:10.1109/72.286885] [
PMID]
38. [33] A. J. Robinson, G. D. Cook, D. P. W. Ellis, E. Fosler-Lussier, S. J. Renals, and D. A. G. Williams, "Connectionist speech recognition of Broadcast News," Speech Communication, vol. 37, no. 1–2, pp. 27–45, 2002. [
DOI:10.1016/S0167-6393(01)00058-9]
39. [34] Y. H. Sung and D. Jurafsky, "Hidden conditional random fields for phone recognition," in IEEE Workshop on Automatic Speech Recognition and Understanding, 2009, pp. 107–112. [
DOI:10.1109/ASRU.2009.5373329]
40. [35] T. N. Sainath, A. R. Mohamed, B. Kingsbury and B. Ramabhadran, "Deep convolutional neural networks for LVCSR," in IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8614–8618. [
DOI:10.1109/ICASSP.2013.6639347]
41. [36] T. N. Sainath, B. Kingsbury, G. Saon, H. Soltau, A.-R. Mohamed, G. Dahl, and B. Ramabhadran, "Deep Convolutional Neural Networks for Large-scale Speech Tasks.," Neural networks, vol. 64, pp. 39–48, Sep. 2014. [
DOI:10.1016/j.neunet.2014.08.005] [
PMID]
42. [37] T. N. Sainath, B. Kingsbury, H. Soltau and B. Ramabhadran, "Optimization techniques to improve training speed of deep neural networks for large speech tasks," IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 11, pp. 2267–2276, 2013. [
DOI:10.1109/TASL.2013.2284378]
43. [38] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, "Convolutional deep belief networks for scalable unsupervised learning of hierarchical represent-tations," in Proceedings of the 26th Annual International Conference on Machine Learning, 2009, vol. 2008, pp. 1–8.
44. [39] O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, "Convolutional neural networks for speech recognition," IEEE Transactions on Speech and Audio Processing, vol. 22, no. 10, pp. 1533–1545, 2014. [
DOI:10.1109/TASLP.2014.2339736]
45. [40] G. Heigold, "A log-linear discriminative mode-ling framework for speech recognition," PhD dissertation, Aachen, Germany, 2010.
46. [41] M. Russell and A. Cook, "Experimental evaluation of duration modelling techniques for automatic speech recognition," in IEEE Inter-national Conference on Acoustics, Speech, and Signal Processing, 1987, vol. 12, pp. 2376–2379. [
DOI:10.1109/ICASSP.1987.1169918]
47. [42] H. Lee and H. Kwon, "Going Deeper with Contextual CNN for Hyperspectral Image Classification," IEEE Transactions on Image Processing, vol. 26, no. 10, pp. 4843–4855, 2017. [
DOI:10.1109/TIP.2017.2725580] [
PMID]
48. [43] C. Dong, C. C. Loy, K. He, and X. Tang, "Image Super-Resolution Using Deep Convolutional Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295–307, 2016. [
DOI:10.1109/TPAMI.2015.2439281] [
PMID]
49. [44] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. [
DOI:10.1109/CVPR.2016.90]
50. [45] A. Graves, A. Mohamed, and G. Hinton, "Speech Recognition with Deep Recurrent Neural Networks," in Acoustics, Speech and Signal Processing (ICASSP), 2013, no. 3, pp. 6645–6649.
51. [46] Y. Miao, M. Gowayyed, and F. Metze, "EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding," in IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings, 2016, pp. 167–174.
52. [47] W. Song and J. Cai, "End-to-End Deep Neural Network for Automatic Speech Recognition," CS224d: Deep Learning for Natural Language Processing, pp. 1–8, 2015.
53. [48] S. Kapadia, V. Valtchev and S. J. Young, "MMI training for continuous phoneme recognition on the TIMIT database," in IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993, vol. 2, pp. 491–494. [
DOI:10.1109/ICASSP.1993.319349]
54. [49] M. Bijankhan, J. Sheikhzadegan, and M. R. Roohani, Y. Samareh, "FARSDAT- the speech database of farsi spoken language," in proceedings Australian conference on speech science and technology, 1994, vol. 2, pp. 826–830.
55. [50] B. H. Juang, W. Chou, and C. H. Lee, "Minimum classification error rate methods for speech recognition," IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 257–265, 1997. [
DOI:10.1109/89.568732]
56. [51] E. McDermott, T. J. Hazen, J. Roux, A. Nakamura and S. Katagiri, "Discriminative Training for Large-Vocabulary Speech Recog-nition Using Minimum Classification Error," IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 203–223, 2007. [
DOI:10.1109/TASL.2006.876778]
57. [52] F. Sha and L. Saul, "Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition," in IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2006, vol. 1, pp. 265–268.
58. [53] G. Zweig, P. Nguyen, D. Van Compernolle, K. Demuynck, L. Atlas, P. Clark, G. Sell, M. Wang, F. Sha, H. Hermansky, D. Karakos, A. Jansen, S. Thomas, S. Bowman and J. Kao, "Speech recognition with segmental conditional random fields," in IEEE International Conference on Acoustics, Speech and Signal Processing, 2011, pp. 5044–5047.