asww

,

Signal and Data Processing Journal A scientific journal officially licensed by the Commission for Scientific Publications of the (MSRT). Publisher: Research Ceter for Developmen of Technologies

EN FA

Volume 5, Issue 2 (3-2009) JSDP 2009, 5(2): 3-16 | Back to browse issues page

Mendeley

Zotero

RefWorks

asww. JSDP 2009; 5 (2) :3-16
URL: http://jsdp.rcisp.ac.ir/article-1-747-en.html

asww

Abstract: (4943 Views)

Abstract of spoken word recognition is proposed. This model is particularly concerned with extraction of cues from the signal leading to a specification of a word in terms of bundles of distinctive features, which are assumed to be the building blocks of words. In the model proposed, auditory input is chunked into a set of successive time slices. It is assumed that the derivation of the underlying word pattern proceeds in three layers: Features, phonemes, words. The feature layer has a complete set of feature detectors at every time slice. In this layer, the detection of the underlying pattern of distinctive features from the speech signal proceeds in three steps. In the first step, numerical values for features are obtained measuring acoustic attributes in each time slice. The acoustic attributes are either acoustic landmarks corresponding to articulator-free features which are identified, based on amplitude changes in various energy bands, or acoustic cues in the vicinity of the landmarks corresponding to articulator-bound features. Continuous perceptual feature values are, then processed into a much more structured representation, namely phonological surface structure. This is carried out in Perception Grammar as suggested by Boersma (1998). In the third step, a further processing is carried out to turn the discrete representation into an abstract one yielding the underlying pattern of distinctive features. The next layer of the model has a complete set of phoneme detectors for every three time slices, but each set spans six time slices so the sets overlap. This means that the detection of adjacent phonemes will also overlap; this is supposed to simulate coarticulation. The top layer has a complete set of word detectors centered on every three time slices; again, the sets overlap, the number of time slices per word detector is variable because it depends on the length of each individual word.

Keywords: Optimality- Perception Grammar- Recognition Grammar- Acoustic constrai

Full-Text [PDF 3781 kb] (1307 Downloads)

Type of Study: Research | Subject: Paper
Received: 2009/03/19 | Accepted: 2018/02/21 | Published: 2018/02/21 | ePublished: 2018/02/21

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.