Volume 13, Issue 1 (6-2016)                   JSDP 2016, 13(1): 39-56 | Back to browse issues page

XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Ansari Z, Seyyedsalehi A. Deep Modular Neural Networks with Double Spatio-temporal َAssociation Structure for Persian Continuous Speech Recognition. JSDP 2016; 13 (1) :39-56
URL: http://jsdp.rcisp.ac.ir/article-1-277-en.html
Amirkabir University of Technology
Abstract:   (6810 Views)

In this article, growable deep modular neural networks for continuous speech recognition are introduced. These networks can be grown to implement the spatio-temporal information of the frame sequences at their input layer as well as their labels at the output layer at the same time. The trained neural network with such double spatio-temporal association structure can learn the phonetic sequence subspace. Therefore, it can filter out invalid phonetic sequences in its own structure and output valid sequences. To evaluate the performance of these growable neural networks, we used FARSDAT and BIG FARSDAT datasets. Experimental results on FARSDAT show that deep modular neural networks outperform the phone accuracy rate of GMM-HMM models with an absolute improvement of 2.7%. Moreover, developing deep modular neural networks to a double spatio-temporal association structure improves their result by 5.1%. As there is no phonetic labeling for BIG FARSDAT, a semi-supervised learning algorithm is proposed to fine-tune the neural network with double spatio-temporal structure on this dataset, which achieves a comparable result with HMMs.

Full-Text [PDF 3543 kb]   (2242 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2014/10/19 | Accepted: 2016/02/26 | Published: 2016/06/22 | ePublished: 2016/06/22

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing