Signal and Data Processing

fa بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز Performance Improvement of Continuous Speech Recognition System Using Extracted Features of Speech Manifolds in the Reconstructed Phase Space مقالات پردازش گفتار Paper پژوهشي Research یکی از رویکردهای موثر در بهبود کارایی سیستم‌ های بازشناسی گفتار، طراحی روش‌های متنوع استخراج ویژگی از سیگنال گفتار و ترکیب اطلاعات به دست آمده از آن ها است. تحقیقات اخیر نشان می دهد که سیگنال گفتار دارای رفتار غیرخطی و آشوبی است، ولی از این مشخصه سیگنال گفتار در سیستم‌های بازشناسی پیوسته گفتار استفاده نمی‌شود. یکی از حوزه های مناسب برای نمایش دینامیک غیرخطی سیگنال آشوبی، فضای بازسازی شده فاز (RPS) است، از اینرو در این مقاله یک روش جدید استخراج ویژگی مبتنی بر RPS (LLRPS) پیشنهاد شده است. این ویژگی‌ها از امتیاز شباهت تراژکتوری سیگنال گفتار جاسازی شده در RPS با مجموعه ای از مانیفولدهای واجی از پیش تعیین شده محاسبه می شوند. سپس مقادیر احتمال پسین واجی بوسیله ساختار شبکه عصبی TMLP از روی ویژگی های LLRPS تخمین زده می شود. ساختار شبکه عصبی استفاده شده بصورتی است که علاوه بر توانایی استخراج اطلاعات دینامیک، دارای قابلیت پیاده سازی روش های متنوع ترکیب خروجی است. نتایج آزمایشات بر روی مجموعه دادگان گفتاری فارس‌دات نشان می دهد که ترکیب غیرخطی خروجی سیستم های بازشناسی، شامل ویژگی های متداول کپستروم MFCC و ویژگی های پیشنهادی LLRPS، به ترتیب منجر به بهبود 94/3 درصد در دقت بازشناسی قاب و 02/4 درصد در دقت بازشناسی واج نسبت به عملکرد سیستم بازشناسی پایه شده است. The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ASR systems. Reconstructed phase space (RPS) is an appropriate domain to exhibit nonlinear properties of a chaotic signal. Therefore, in this paper a new method is proposed to utilize the RPS-based features (LLRPS). These features will be computed using similarity scores between the embedded speech signal in the RPS and a set of predefined phoneme manifolds. Then, TMLP-based neural network estimates phoneme posterior probability over the LLRPS features. The used neural network includes proper properties such as extracting dynamic information and output combination methods. Experimental results using Farsdat speech database show that nonlinear combination of the speech recognition outputs including traditional MFCC features and the LLRPS features, leading to improvement of 3.94% and 4.02% in the accuracy of frame and phoneme recognition, respectively. بازشناسی گفتار پیوسته، استخراج ویژگی، فضای بازسازی شده فاز، مانیفولدهای واجی ، امتیاز درست نمایی، شبکه عصبی. Continuous speech recognition, Feature extraction, Reconstructed phase space, Phoneme manifolds, Likelihood Score, Neural network 42 27 http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-306-1&slc_lang=fa&sid=1 Yasser Shekofteh یاسر شکفته shekofteh.yasser@gmail.com 1003194753284600666 1003194753284600666 Yes Amirkabir University دانشگاه امیرکبیر Farshad Almasganj فرشاد الماس گنج falmas214@yahoo.com 1003194753284600667 1003194753284600667 No Amirkabir University دانشگاه امیرکبیر