Volume 5, Issue 2 (3-2009)                   JSDP 2009, 5(2): 3-16 | Back to browse issues page

XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

asww. JSDP 2009; 5 (2) :3-16
URL: http://jsdp.rcisp.ac.ir/article-1-747-en.html
Abstract:   (3589 Views)

Abstract of spoken word recognition is proposed. This model is particularly concerned with extraction of cues from the signal leading to a specification of a word in terms of bundles of distinctive features, which are assumed to be the building blocks of words. In the model proposed, auditory input is chunked into a set of successive time slices. It is assumed that the derivation of the underlying word pattern proceeds in three layers: Features, phonemes, words. The feature layer has a complete set of feature detectors at every time slice. In this layer, the detection of the underlying pattern of distinctive features from the speech signal proceeds in three steps. In the first step, numerical values for features are obtained measuring acoustic attributes in each time slice. The acoustic attributes are either acoustic landmarks corresponding to articulator-free features which are identified, based on amplitude changes in various energy bands, or acoustic cues in the vicinity of the landmarks corresponding to articulator-bound features. Continuous perceptual feature values are, then processed into a much more structured representation, namely phonological surface structure. This is carried out in Perception Grammar as suggested by Boersma (1998). In the third step, a further processing is carried out to turn the discrete representation into an abstract one yielding the underlying pattern of distinctive features. The next layer of the model has a complete set of phoneme detectors for every three time slices, but each set spans six time slices so the sets overlap. This means that the detection of adjacent phonemes will also overlap; this is supposed to simulate coarticulation. The top layer has a complete set of word detectors centered on every three time slices; again, the sets overlap, the number of time slices per word detector is variable because it depends on the length of each individual word.

Full-Text [PDF 3781 kb]   (885 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2009/03/19 | Accepted: 2018/02/21 | Published: 2018/02/21 | ePublished: 2018/02/21

Add your comments about this article : Your username or Email:
CAPTCHA

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing