<?xml version="1.0" encoding="utf-8"?>
<journal>
<title>Signal and Data Processing</title>
<title_fa>پردازش علائم و داده‌ها</title_fa>
<short_title>JSDP</short_title>
<subject>Engineering &amp; Technology</subject>
<web_url>http://jsdp.rcisp.ac.ir</web_url>
<journal_hbi_system_id>1</journal_hbi_system_id>
<journal_hbi_system_user>admin</journal_hbi_system_user>
<journal_id_issn>2538-4201</journal_id_issn>
<journal_id_issn_online>2538-421X</journal_id_issn_online>
<journal_id_pii></journal_id_pii>
<journal_id_doi>10.61882/jsdp</journal_id_doi>
<journal_id_iranmedex></journal_id_iranmedex>
<journal_id_magiran></journal_id_magiran>
<journal_id_sid>1</journal_id_sid>
<journal_id_nlai>8888</journal_id_nlai>
<journal_id_science></journal_id_science>
<language>fa</language>
<pubdate>
	<type>jalali</type>
	<year>1399</year>
	<month>3</month>
	<day>1</day>
</pubdate>
<pubdate>
	<type>gregorian</type>
	<year>2020</year>
	<month>6</month>
	<day>1</day>
</pubdate>
<volume>17</volume>
<number>1</number>
<publish_type>online</publish_type>
<publish_edition>1</publish_edition>
<article_type>fulltext</article_type>
<articleset>
	<article>


	<language>fa</language>
	<article_id_doi></article_id_doi>
	<title_fa>بهبود الگوریتم ماشین بردار پشتیبان با الگوریتم رقابت استعماری برای دسته‌بندی اسناد متنی</title_fa>
	<title>An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification</title>
	<subject_fa>مقالات پردازش متن </subject_fa>
	<subject>Paper</subject>
	<content_type_fa>پژوهشي</content_type_fa>
	<content_type>Research</content_type>
	<abstract_fa>&lt;div style=&quot;text-align: justify;&quot;&gt;&lt;strong&gt;&lt;span style=&quot;font-family:B Nazanin;&quot;&gt;&lt;span style=&quot;font-size:10.0pt;&quot;&gt;با توجه به رشد نمایی متون الکترونیکی، سازماندهی و مدیریت متون، مستلزم ابزاری است که اطلاعات و داده&amp;rlm;&#8204;های مورد جستجوی کاربران را در کمترین زمان ارائه دهد؛ از&#8204;این&#8204;رو در سال&#8204;های اخیر روش&#8204;های دسته&#8204;بندی اهمیت ویژه&amp;lrm;ای پیدا کرده است. هدف دسته&#8204;بندی متون دست&#8204;یابی به اطلاعات و داده&#8204;ها در کسری از ثانیه است. یکی از مشکلات اصلی در دسته&amp;rlm;&#8204;بندی متون، ابعاد بالای ویژگی&amp;lrm;هاست. برای کاهش ویژگی&amp;lrm;های متون، انتخاب ویژگی&amp;lrm;ها یکی از مؤثرترین راه&amp;lrm;حل&amp;lrm;هاست. چراکه هزینه محاسباتی که تابعی از طول بردار ویژگی&amp;lrm;هاست، بدون انتخاب ویژگی&#8204;ها افزایش می&amp;rlm;&#8204;یابد. در این مقاله روشی براساس بهبود الگوریتم ماشین بردار پشتیبان با الگوریتم رقابت استعماری برای دسته&#8204;بندی اسناد متنی ارائه شده است. در روش پیشنهادی، از الگوریتم رقابت استعماری برای انتخاب ویژگی&amp;lrm;های و از الگوریتم ماشین بردار پشتیبان برای دسته&amp;lrm;بندی متون استفاده شده است.&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt; &lt;strong&gt;&lt;span style=&quot;font-family:B Nazanin;&quot;&gt;&lt;span style=&quot;font-size:10.0pt;&quot;&gt;آزمایش و ارزیابی روش پیشنهادی بر روی مجموعه داده&#8204;های&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:Times New Roman Bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;Reuters21578, WebKB&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:B Nazanin;&quot;&gt;&lt;span style=&quot;font-size:10.0pt;&quot;&gt; و &lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:Times New Roman Bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;Cade 12&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:B Nazanin;&quot;&gt;&lt;span style=&quot;font-size:10.0pt;&quot;&gt; انجام شده است. نتایج شبیه&amp;lrm;سازی حاکی از آن است که روش پیشنهادی در معیارهای دقت، بازخوانی و &lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;span style=&quot;font-family:Times New Roman Bold,serif;&quot;&gt;&lt;span style=&quot;font-size:8.0pt;&quot;&gt;F Measure&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;strong&gt;&lt;span style=&quot;font-family:B Nazanin;&quot;&gt;&lt;span style=&quot;font-size:10.0pt;&quot;&gt; از روش&#8204; ماشین بردار پشتیبان بدون انتخاب ویژگی عملکرد بهینه&amp;lrm;تری دارد. &lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/div&gt;
&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;/span&gt;&lt;/strong&gt;</abstract_fa>
	<abstract>&lt;div style=&quot;text-align: justify;&quot;&gt;&lt;strong&gt;Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years.&lt;/strong&gt;&lt;span dir=&quot;RTL&quot;&gt;&lt;/span&gt;&lt;br&gt;
&lt;strong&gt;In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text classification is one of the most important parts in data mining and machine learning. Classification can be considered as the most important supervised technique which classifies the input space to k groups based on similarity and difference such that targets in the same group are similar and targets in different groups are different. Text classification system has been widely used in many fields, like spam filtering, news classification, web page detection, Bioinformatics, machine translation, automatic response systems, and applications regarding of automatic organization of documents.&lt;/strong&gt;&lt;span dir=&quot;RTL&quot;&gt;&lt;/span&gt;&lt;br&gt;
&lt;strong&gt;The important point in obtaining an efficient text classification method is extraction and selection of key features of texts. It is proved that only 33% of words and features of the texts are useful and they can be used to extract information and most words existing in texts are used to represent purpose of a text and they are sometimes repeated. Feature selection is known as a good solution to high dimensionality of the feature space. Excessive number of Features not only increase computation time but also degrade classification accuracy. In general, purpose of extracting and selecting features of texts is to reduce data volume, time required for training, computational time and increase performance speed of the methods proposed for text classification. Feature extraction&lt;/strong&gt; &lt;strong&gt;refers to the process of generating a small set of new features by combining or transforming the original ones, while in feature selection dimension of the space is reduced by selecting the most prominent features&lt;/strong&gt;&lt;span dir=&quot;RTL&quot;&gt;.&lt;/span&gt;&lt;span dir=&quot;RTL&quot;&gt;&lt;/span&gt;&lt;br&gt;
&lt;strong&gt;In this paper, a solution to improve support vector machine algorithm using Imperialism Competitive Algorithm, are provided. In this proposed method, the Imperialism Competitive Algorithm for selecting features and the support vector machine algorithm for Classification of texts are used. &lt;/strong&gt;&lt;span dir=&quot;RTL&quot;&gt;&lt;/span&gt;&lt;br&gt;
&lt;strong&gt;At the stage of extracting the features of the texts, using weighting schemes such as NORMTF, LOGTF, ITF, SPARCK, and TF, each extracted word is allocated a weight in order to determine the role of the words in terms of their effects as the keywords of the texts. The weight of each word indicates the extent of its effect on the main topic of the text compared to other words used in the same text. In the proposed method, the TF weighting scheme is used for attributing weights to the words. In this scheme, the features are a function of the distribution of different features in each of the documents &lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png&quot; &gt; &lt;strong&gt;. &lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Moreover, at this stage, using the process of pruning, low-frequency features and words that are used fewer than two times in the text are pruned. Pruning basically filters low-frequency features in a text [18]. &lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;In order to reduce the number of dimensions of the features and decrease computational complexity, the imperialist competitive algorithm (ICA) is utilized in the proposed method. The main goal of employing the imperialist competitive algorithm (ICA) in the proposed method is minimizing the loss of data in the texts, while also maximizing the reduction of the dimensions of the features. &lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;In the proposed method, since the imperialist competitive algorithm (ICA) has been used for selecting the features, there must be a mapping created between the parameters of the imperialist competitive algorithm (ICA) and the proposed method. Accordingly, when using the imperialist competitive algorithm (ICA) for selecting the key features, the search space includes the dimensions of the features, and among all the extracted features, &lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png&quot; &gt; &lt;strong&gt;, &lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image003.png&quot; &gt; &lt;strong&gt;, or &lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image004.png&quot; &gt; &lt;strong&gt;&amp;nbsp;of all the features are attributed to each of the countries. Since the mapping is carried out randomly, there may be repetitive features in any of the countries as well. Next, based on the general trend of the imperialist competitive algorithm (ICA),some countries which are more powerful are considered as imperialists, while the other countries are considered as colonies. Once the countries are identified, the optimization process can begin. Each country is defined in the form of an &lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image005.png&quot; &gt; &lt;strong&gt;&amp;nbsp;array with different values for the variables as in Equations 2 and 3. &lt;/strong&gt;&lt;span dir=&quot;RTL&quot;&gt;&lt;/span&gt;&lt;/div&gt;

&lt;table align=&quot;center&quot; border=&quot;0&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; dir=&quot;rtl&quot; style=&quot;width:293px;&quot; width=&quot;293&quot;&gt;
	&lt;tbody&gt;
		&lt;tr&gt;
			&lt;td style=&quot;width: 47px; height: 34px; text-align: justify;&quot;&gt;&lt;strong&gt;(2)&lt;/strong&gt;&lt;/td&gt;
			&lt;td style=&quot;width: 246px; height: 34px; text-align: justify;&quot;&gt;&lt;strong&gt;Country = [&lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image006.png&quot; &gt; &lt;strong&gt;,&lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image007.png&quot; &gt; &lt;strong&gt;, &amp;hellip;,&lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image008.png&quot; &gt; &lt;strong&gt;&amp;nbsp;,&lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image009.png&quot; &gt; &lt;strong&gt;]&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;RTL&quot;&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style=&quot;width: 47px; height: 27px; text-align: justify;&quot;&gt;&lt;strong&gt;(3)&lt;/strong&gt;&lt;strong&gt;&lt;span dir=&quot;RTL&quot;&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/td&gt;
			&lt;td style=&quot;width: 246px; height: 27px; text-align: justify;&quot;&gt;Cost = f (Country)&lt;span dir=&quot;RTL&quot;&gt;&lt;/span&gt;&lt;/td&gt;
		&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;

&lt;div style=&quot;text-align: justify;&quot;&gt;&lt;div style=&quot;clear:both;&quot;&gt;&lt;/div&gt;&lt;strong&gt;The variables attributed to each country can be structural features, lexical features, semantic features, or the weight of each word, and so on. Accordingly, the power of each country for identifying the class of each text is increased or decreased based on its variables. &lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;One of the most important phases of the imperialist competitive algorithm (ICA) is the colonial competition phase. In this phase, all the imperialists try to increase the number of colonies they own. Each of the more powerful empires tries to seize the colonies of the weakest empires to increase their own power. In the proposed method, colonies with the highest number of errors in classification and the highest number of features are considered as the weakest empires. &lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Based on trial and error, and considering the target function in the proposed method, the number of key features relevant to the main topic of the texts is set to &lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image004.png&quot; &gt; &lt;strong&gt;&amp;nbsp;of the total extracted features, and only through using&amp;nbsp; &lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image004.png&quot; &gt; &lt;strong&gt;&amp;nbsp;of the key features of each text along with a classifier algorithm such as&lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image010.png&quot; &gt; &lt;strong&gt;, support vector machine (SVM), &lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image011.png&quot; &gt; &lt;strong&gt;&amp;nbsp;nearest neighbors, and so on, the class of that text can be determined in the proposed method. &lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Since the classification of texts is a nonlinear problem, in order to classify texts, the problem must first be mapped into a linear problem. In this paper, the RBF kernel function along with &lt;/strong&gt; &lt;img alt=&quot;&quot; chromakey=&quot;white&quot; src=&quot;file:///C:/Users/user/AppData/Local/Temp/msohtmlclip1/01/clip_image012.png&quot; &gt; &lt;strong&gt;&amp;nbsp;is used for mapping the problem. &lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;The hybrid algorithm is implemented on the Reuters21578, WebKB, and Cade 12 data sets to evaluate the accuracy of the proposed method. The simulation results indicate that the proposed hybrid algorithm in precision, recall and F Measure criteria is more efficient than primary support machine carriers.&lt;/strong&gt;&lt;br&gt;
&amp;nbsp;&lt;/div&gt;</abstract>
	<keyword_fa>انتخاب ویژگی, دسته‌بندی متون, الگوریتم رقابت استعماری, الگوریتم ماشین بردار پشتیبان, بهینه‌سازی</keyword_fa>
	<keyword>Feature Selection, Text Classification, Imperialism Competitive Algorithm, Support Vector Machines Algorithm, Optimization</keyword>
	<start_page>117</start_page>
	<end_page>130</end_page>
	<web_url>http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-1536-1&amp;slc_lang=fa&amp;sid=1</web_url>


<author_list>
	<author>
	<first_name>Zahra</first_name>
	<middle_name></middle_name>
	<last_name>Asheghi Dizaji</last_name>
	<suffix></suffix>
	<first_name_fa>زهرا</first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa>عاشقی دیزجی</last_name_fa>
	<suffix_fa></suffix_fa>
	<email>zahra_ashegi@yahoo.com</email>
	<code>10031947532846008783</code>
	<orcid>10031947532846008783</orcid>
	<coreauthor>Yes
</coreauthor>
	<affiliation>Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran.</affiliation>
	<affiliation_fa>گروه مهندسی کامپیوتر، واحد ارومیه، دانشگاه آزاد اسلامی، ارومیه، ایران</affiliation_fa>
	 </author>


	<author>
	<first_name>Sakineh</first_name>
	<middle_name></middle_name>
	<last_name>Asghari Aghjehdizaj</last_name>
	<suffix></suffix>
	<first_name_fa>سکینه</first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa>اصغری آقجه‎دیزج</last_name_fa>
	<suffix_fa></suffix_fa>
	<email>Sakineh154asghari@gmail.com</email>
	<code>10031947532846008784</code>
	<orcid>10031947532846008784</orcid>
	<coreauthor>No</coreauthor>
	<affiliation>Department of Computer Engineering, Bonab Branch, Islamic Azad University, Maragheh, Iran</affiliation>
	<affiliation_fa>گروه مهندسی کامپیوتر، واحد بناب، دانشگاه آزاد اسلامی، بناب، ایران</affiliation_fa>
	 </author>


	<author>
	<first_name>Farhad</first_name>
	<middle_name></middle_name>
	<last_name>Soleimanian Gharehchopogh</last_name>
	<suffix></suffix>
	<first_name_fa>فرهاد</first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa>سلیمانیان قره چپق</last_name_fa>
	<suffix_fa></suffix_fa>
	<email>farhad@iaurmia.ac.ir</email>
	<code>10031947532846008785</code>
	<orcid>10031947532846008785</orcid>
	<coreauthor>No</coreauthor>
	<affiliation>Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran.</affiliation>
	<affiliation_fa>گروه مهندسی کامپیوتر، واحد ارومیه، دانشگاه آزاد اسلامی، ارومیه، ایران</affiliation_fa>
	 </author>


</author_list>


	</article>
</articleset>
</journal>
