Deep Modular Neural Networks with Double Spatio-temporal َAssociation Structure for Persian Continuous Speech Recognition

Ansari, Zohreh; Seyyedsalehi, Ali

Volume 13, Issue 1 (6-2016) JSDP 2016, 13(1): 39-56 | Back to browse issues page

Mendeley

Zotero

RefWorks

Ansari Z, Seyyedsalehi A. Deep Modular Neural Networks with Double Spatio-temporal َAssociation Structure for Persian Continuous Speech Recognition. JSDP 2016; 13 (1) :39-56
URL: http://jsdp.rcisp.ac.ir/article-1-277-en.html

Deep Modular Neural Networks with Double Spatio-temporal َAssociation Structure for Persian Continuous Speech Recognition

Zohreh Ansari

, Ali Seyyedsalehi ^*

Amirkabir University of Technology

Abstract: (7696 Views)

In this article, growable deep modular neural networks for continuous speech recognition are introduced. These networks can be grown to implement the spatio-temporal information of the frame sequences at their input layer as well as their labels at the output layer at the same time. The trained neural network with such double spatio-temporal association structure can learn the phonetic sequence subspace. Therefore, it can filter out invalid phonetic sequences in its own structure and output valid sequences. To evaluate the performance of these growable neural networks, we used FARSDAT and BIG FARSDAT datasets. Experimental results on FARSDAT show that deep modular neural networks outperform the phone accuracy rate of GMM-HMM models with an absolute improvement of 2.7%. Moreover, developing deep modular neural networks to a double spatio-temporal association structure improves their result by 5.1%. As there is no phonetic labeling for BIG FARSDAT, a semi-supervised learning algorithm is proposed to fine-tune the neural network with double spatio-temporal structure on this dataset, which achieves a comparable result with HMMs.

Keywords: Deep neural networks, Modular neural networks, Pre-training, Semi-supervised learning, Continuous speech recognition

Full-Text [PDF 3543 kb] (2646 Downloads)

Type of Study: Research | Subject: Paper
Received: 2014/10/19 | Accepted: 2016/02/26 | Published: 2016/06/22 | ePublished: 2016/06/22

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote