<?xml version="1.0" encoding="utf-8"?>
<journal>
<title>Signal and Data Processing</title>
<title_fa>پردازش علائم و داده‌ها</title_fa>
<short_title>JSDP</short_title>
<subject>Engineering &amp; Technology</subject>
<web_url>http://jsdp.rcisp.ac.ir</web_url>
<journal_hbi_system_id>1</journal_hbi_system_id>
<journal_hbi_system_user>admin</journal_hbi_system_user>
<journal_id_issn>2538-4201</journal_id_issn>
<journal_id_issn_online>2538-421X</journal_id_issn_online>
<journal_id_pii></journal_id_pii>
<journal_id_doi>10.66224/jsdp</journal_id_doi>
<journal_id_iranmedex></journal_id_iranmedex>
<journal_id_magiran></journal_id_magiran>
<journal_id_sid>1</journal_id_sid>
<journal_id_nlai>8888</journal_id_nlai>
<journal_id_science></journal_id_science>
<language>fa</language>
<pubdate>
	<type>jalali</type>
	<year>1396</year>
	<month>3</month>
	<day>1</day>
</pubdate>
<pubdate>
	<type>gregorian</type>
	<year>2017</year>
	<month>6</month>
	<day>1</day>
</pubdate>
<volume>14</volume>
<number>1</number>
<publish_type>online</publish_type>
<publish_edition>1</publish_edition>
<article_type>fulltext</article_type>
<articleset>
	<article>


	<language>fa</language>
	<article_id_doi></article_id_doi>
	<title_fa>مقایسه روش های طیفی برای شناسایی زبان گفتاری         </title_fa>
	<title>A survey on spectral methods in spoken language identification</title>
	<subject_fa>مقالات پردازش گفتار </subject_fa>
	<subject>Paper</subject>
	<content_type_fa>پژوهشي</content_type_fa>
	<content_type>Research</content_type>
	<abstract_fa>&lt;p dir=&quot;RTL&quot;&gt;&amp;nbsp;&lt;strong&gt;&lt;span style=&quot;font-family:b nazanin;&quot;&gt;شناسایی خودکار زبان گفتاری به تشخیص زبان از روی سیگنال گفتار گفته می&amp;shy;شود. شناسایی زبان به&#8204;طورمعمول به یکی از&amp;nbsp; دو دسته روش آوایی و طیفی انجام می&amp;shy;شود. در این مقاله، انواع روش&amp;shy;های مختلف طیفی برای بازشناسی زبان گفتاری معرفی شده و نتایج به&#8204;کارگیری آنها بر روی یک مجموعه دادگان گفتاری تلفنی محاوره&amp;shy;ای مقایسه شده است. روش طیفی پایۀ شناسایی زبان، مدل مخلوط گوسی-مدل جهانی (&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:times new roman bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;GMM-UBM&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:b nazanin;&quot;&gt;) است. برای بهبود مدل گوسی هر زبان از روش تمایزی &lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:times new roman bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;MMI&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:b nazanin;&quot;&gt; و برای مدل&amp;shy;کردن دینامیک زبان از مدل پنهان مارکوف ارگودیک (&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:times new roman bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;EHMM&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:b nazanin;&quot;&gt;) استفاده می&amp;shy;شود. روش&amp;shy;های &lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:times new roman bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;GSV-SVM&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:b nazanin;&quot;&gt; و روش نشانه&amp;shy;گذار مبتنی بر &lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:times new roman bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;GMM&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:b nazanin;&quot;&gt; (&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:times new roman bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;GMM Tokenizer&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:b nazanin;&quot;&gt;) نیز دو روش طیفی دیگر است که مورد بررسی قرار گرفته است. در این مقاله همچنین روش&amp;shy;های جدیدِ مدل&amp;shy;سازی تنوعات کانال و گوینده (تحلیل توأم عامل&amp;shy;ها (&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:times new roman bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;JFA&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:b nazanin;&quot;&gt;) و بردار شناسایی (&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:times new roman bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;i-Vector&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:b nazanin;&quot;&gt;)) به&#8204;کار رفته و برای بهبود نتایج آن از چند روش&amp;shy; جبران&amp;shy;سازی تنوعات استفاده شده است. علاوه&#8204;براین برای سهولت تصمیم&amp;shy;گیری و کاهش خطای سامانۀ شناسایی زبان، از پس&amp;shy;پردازش امتیاز استفاده شده است. این مقاله بخشی از هفت سال پژوهش&#8204; در زمینه شناسایی زبان گفتاری در پژوهشگاه توسعه فناوری&amp;shy;های پیشرفته خواجه نصیرالدین طوسی است و تنها خلاصه&amp;shy;ای از روش&amp;shy;ها و نتایج به&#8204;دست&#8204;آمده در این مقاله آورده شده است.&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;
</abstract_fa>
	<abstract>&lt;p&gt;&lt;strong&gt;Identifying spoken language&lt;/strong&gt; &lt;strong&gt;automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The Gaussian mixture model is the most common statistical model in spectral-based language identification systems. On the other hand, in phonetic-based methods, speech signals are divided into a sequence of tokens using the hidden Markov model (HMM) and a language model is trained using the obtained sequence. Approaches like PRLM, PPRLM, and PR-SVM are some examples of phonetic-based methods&lt;span dir=&quot;RTL&quot;&gt;.&lt;/span&gt; In research papers, usually a combination of phonetic-based and spectral-based systems are used to achieve a high quality language identification system&lt;span dir=&quot;RTL&quot;&gt;.&lt;/span&gt; Spectral-based methods have been the focus of researchers, since they have no need for labeled data and usually achieve better results than phonetic approaches. Therefore, in this paper, these methods used for language identification and different spectral methods, are introduced, implemented, and compared with spoken language recognition.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;The basic spectral language identification method is Gaussian Mixture Model-Universal Background Model (GMM-UBM). In this paper, the MMI discrimination method is used to improve the Gaussian model of each language. Moreover, in order to model the language dynamically, GMM is replaced with the ergodic hidden Markov model (EHMM). GSV-SVM and GMM tokenizer methods are also implemented as two popular spectral approaches. In this paper, novel speaker and channel variation modeling methods are used as language identification approaches, including joint factor analysis (JFA), identity vector (i-Vector) and several variations compensation methods exploited to improve the results of i-Vector. &lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Furthermore, in order to boost the performance of language recognition systems, different post-processing methods are applied. For post-processing, each element of raw score vector indicates the degree by which the spoken signal belongs to a language. Post-processing methods are applied to this vector as a classifier and allows making better language detection decisions by mapping the raw score vector to a space of desired languages. Different studies have employed different post-processing methods, including GMM, NN, SVM, and LLR. This study exploits several score post-processing methods to improve the quality of language recognition.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;The goal of the experiments in this article is to detect and distinguish Farsi, English, and Arabic, individually and simultaneously from other languages. The latter is also called open-set language identification. The signals considered in this paper include two-sided conversations, whose quality is usually not desirable due to strong noise signals, background noises of individuals or music, accents, etc.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Gaussian mixture-universal model (GMM-UBM) was implemented as the basic method. In this approach, mean EER of the three target languages (Farsi, English, and Arabic) was 13.58. &lt;/strong&gt;&lt;strong&gt;Experimental results indicated that training the GMM language identification system with the MMI discrimination training algorithm is more efficient than systems only trained by the ML algorithm. More specifically, the mean EER of the three target languages was reduced about 8 percent in comparison to GMM-UBM. The GMM tokenizer method was also tested as a novel spectral approach. Using this method, the mean EER of the three target languages was also about 5 percent better than GMM-UBM. &lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;In this study, the GSV-SVM discrimination method was also used for language recognition. The results of this method were considerably better than those of common spectral approaches, such that the mean EER of the three target languages was reduced by 11 percent in comparison to GMM-UBM. This study improves the low speed of this method using a model pushing method.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;This study also implemented two novel methods, JFA and i-Vector. According to the results, both of these methods provide better results than GMM-UBM, such that the mean EER values of the three target languages in JFA and i-Vector are respectively reduced by 1% and 12%. Generally, experimental results showed that i-Vector provides better results than other spectral language identification systems.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;This study is a result of a seven-year research in spoken language identification in the advanced technology development center of Khajeh Nasiredin Tousi. The ongoing research includes studying and implementing novel spectral language identification algorithms like PLDA and state-of-the-art phonetic language identification methods to combine the two spectral and phonetic systems and eventually, achieving a high quality language identification system.&lt;/strong&gt;&lt;/p&gt;
</abstract>
	<keyword_fa>شناسایی خودکار زبان گفتاری, روش‌های طیفی, آموزش تمایزی, جبران‌سازی تنوعات کانال, بردار شناسایی. </keyword_fa>
	<keyword>Automatic Spoken Language Recognition, Acoustic Approaches, Discriminative training, Channel compensation, Identity Vector.</keyword>
	<start_page>111</start_page>
	<end_page>134</end_page>
	<web_url>http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-798-1&amp;slc_lang=fa&amp;sid=1</web_url>


<author_list>
	<author>
	<first_name>shaghayegh</first_name>
	<middle_name></middle_name>
	<last_name>reza</last_name>
	<suffix></suffix>
	<first_name_fa>شقایق</first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa>رضا</last_name_fa>
	<suffix_fa></suffix_fa>
	<email>shaghayegh.reza@gmail.com</email>
	<code>10031947532846005230</code>
	<orcid>10031947532846005230</orcid>
	<coreauthor>Yes
</coreauthor>
	<affiliation>Amirkabir university</affiliation>
	<affiliation_fa>پژوهشکده پردازش داده، پژوهشگاه توسعه فناوری‌های پیشرفته خواجه‌نصیرالدین طوسی</affiliation_fa>
	 </author>


	<author>
	<first_name>jahanshah</first_name>
	<middle_name></middle_name>
	<last_name>kabudian</last_name>
	<suffix></suffix>
	<first_name_fa>جهانشاه</first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa>کبودیان</last_name_fa>
	<suffix_fa></suffix_fa>
	<email>kabudian@razi.ac.ir</email>
	<code>10031947532846005231</code>
	<orcid>10031947532846005231</orcid>
	<coreauthor>No</coreauthor>
	<affiliation>Razi University,Kermanshah</affiliation>
	<affiliation_fa>دانشگاه رازی کرمانشاه</affiliation_fa>
	 </author>


</author_list>


	</article>
</articleset>
</journal>
