<?xml version="1.0" encoding="utf-8"?>
<journal>
<title>Signal and Data Processing</title>
<title_fa>پردازش علائم و داده‌ها</title_fa>
<short_title>JSDP</short_title>
<subject>Engineering &amp; Technology</subject>
<web_url>http://jsdp.rcisp.ac.ir</web_url>
<journal_hbi_system_id>1</journal_hbi_system_id>
<journal_hbi_system_user>admin</journal_hbi_system_user>
<journal_id_issn>2538-4201</journal_id_issn>
<journal_id_issn_online>2538-421X</journal_id_issn_online>
<journal_id_pii></journal_id_pii>
<journal_id_doi>10.61882/jsdp</journal_id_doi>
<journal_id_iranmedex></journal_id_iranmedex>
<journal_id_magiran></journal_id_magiran>
<journal_id_sid>1</journal_id_sid>
<journal_id_nlai>8888</journal_id_nlai>
<journal_id_science></journal_id_science>
<language>fa</language>
<pubdate>
	<type>jalali</type>
	<year>1402</year>
	<month>3</month>
	<day>1</day>
</pubdate>
<pubdate>
	<type>gregorian</type>
	<year>2023</year>
	<month>6</month>
	<day>1</day>
</pubdate>
<volume>20</volume>
<number>1</number>
<publish_type>online</publish_type>
<publish_edition>1</publish_edition>
<article_type>fulltext</article_type>
<articleset>
	<article>


	<language>fa</language>
	<article_id_doi></article_id_doi>
	<title_fa>خوشه بندی گروهی طیفی لاپلاسی-p نیمه نظارتی برای داده های با ابعاد بالا</title_fa>
	<title>Ensembling semi-supervised p-spectral clustering for high dimensional data</title>
	<subject_fa>مقالات گروه علائم حیاتی ( مرتبط با مهندسی پزشکی)</subject_fa>
	<subject>Paper</subject>
	<content_type_fa>پژوهشي</content_type_fa>
	<content_type>Research</content_type>
	<abstract_fa>&lt;b&gt;&lt;span dir=&quot;RTL&quot; lang=&quot;FA&quot; style=&quot;font-size:10.0pt&quot;&gt;&lt;span style=&quot;line-height:107%&quot;&gt;&lt;span b=&quot;&quot; nazanin=&quot;&quot; style=&quot;font-family:&quot;&gt;با توجه به افزایش روزافزون اطلاعات و تحلیل دقیق آنها مسأله خوشه &amp;shy;بندی که برای آشکارسازی الگوهای پنهان موجود در داده &amp;shy;ها مورد استفاده قرار می&amp;shy; گیرد، همچنان از اهمیت بالایی برخوردار است. از طرفی خوشه &amp;shy;بندی داده &amp;shy;های با ابعاد بالا با استفاده از روش&amp;shy;های سنتی پیشین دارای محدودیت &amp;shy;های زیادی است. در مقاله حاضر، یک روش خوشه&amp;shy; بندی گروهی نیمه&amp;shy; نظارتی برای مجموعه &amp;shy;ای از داده&amp;shy; های پزشکی با ابعاد بالا پیشنهاد می &amp;shy;شود. در فرموله&amp;shy; سازی مسأله خوشه&amp;shy; بندی اطلاعات نظارتی اندکی به عنوان دانش پیشین با استفاده از اطلاعات مربوط به تشابه و یا عدم تشابه (بصورت تعدادی زوج محدودیت&amp;shy; های دوبه&amp;shy; دو) در نظر گرفته می&amp;shy;شود. در ابتدا با استفاده از خاصیت تراگذری زوج محدودیت&amp;shy; های دوبه &amp;shy;دو را بر روی تمام داده &amp;shy;ها تعمیم می &amp;shy;دهیم. سپس با تقسیم فضای ویژگی به صورت تصادفی به چندین زیرفضای نابرابر ابعاد داده &amp;shy;ها را کاهش می&amp;shy; دهیم. خوشه&amp;shy; بندی طیفی نیمه&amp;shy; نظارتی&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt; &lt;b&gt;&lt;span dir=&quot;RTL&quot; lang=&quot;FA&quot; style=&quot;font-size:10.0pt&quot;&gt;&lt;span style=&quot;line-height:107%&quot;&gt;&lt;span b=&quot;&quot; nazanin=&quot;&quot; style=&quot;font-family:&quot;&gt;مبتنی بر گراف لاپلاسی-&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt; &lt;span style=&quot;font-size:9.0pt&quot;&gt;&lt;span style=&quot;line-height:107%&quot;&gt;&lt;span cambria=&quot;&quot; math=&quot;&quot; style=&quot;font-family:&quot;&gt;&lt;m:r&gt;&lt;m:rpr&gt;&lt;m:scr m:val=&quot;roman&quot;&gt;&lt;m:sty m:val=&quot;p&quot;&gt;&lt;/m:sty&gt;&lt;/m:scr&gt;&lt;/m:rpr&gt;p&lt;/m:r&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; &lt;b&gt;&lt;span dir=&quot;RTL&quot; lang=&quot;FA&quot; style=&quot;font-size:10.0pt&quot;&gt;&lt;span style=&quot;line-height:107%&quot;&gt;&lt;span b=&quot;&quot; nazanin=&quot;&quot; style=&quot;font-family:&quot;&gt;در هر زیر فضا بطور مستقل انجام می &amp;shy;شود. سپس با استفاده از نتایج هر کدام یک ماتریس مجاورت، حاصل از تجمیع نتایج هر کدام (مبتنی بر یادگیری گروهی) ایجاد می &amp;shy;شود. در نهایت با استفاده از چند عملگر جستجو روی زیرفضاها، بهترین زیرفضا، یعنی زیرفضایی که بهترین نتیجه خوشه&amp;shy; بندی را دارد، می&amp;shy; یابیم. &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span dir=&quot;RTL&quot; lang=&quot;FA&quot; style=&quot;font-size:10.0pt&quot;&gt;&lt;span style=&quot;line-height:107%&quot;&gt;&lt;span b=&quot;&quot; nazanin=&quot;&quot; style=&quot;font-family:&quot;&gt;نتایج آزمایشات متعدد بر روی چندین داده &amp;shy;ی پزشکی با ابعاد بالا نشان می&amp;shy; دهد که رویکرد پیشنهادی، عملکرد و کارآیی بهتری نسبت به روش&amp;shy;های پیشین دارد.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;strong&gt;&lt;span dir=&quot;LTR&quot;&gt;&lt;/span&gt;&lt;/strong&gt;</abstract_fa>
	<abstract>&lt;span style=&quot;font-size:11pt&quot;&gt;&lt;span style=&quot;line-height:normal&quot;&gt;&lt;span sans-serif=&quot;&quot; style=&quot;font-family:Calibri,&quot;&gt;&lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt;Due to the increasing information and the detailed analysis of them, the clustering problems that detect the hidden patterns lie in the data are still &lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt;of great&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt; importance. On the other hand, clustering of high-dimensional data using previous traditional methods has many limitations. In this study, a semi-supervised ensemble clustering method is proposed for a set of high-dimensional medical data. In the proposed method of this study, little information is available as prior knowledge using the information on similarity or dissimilarity (as a number of pairwise constraints). Initially using the transitive property, we generalize the pairwise constraints to all data.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt; &lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt;Then we divide the feature space into a number of sub-spaces, and to find the optimal clustering solution, the feature space is divided into an unequal number of sub-spaces randomly.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt; &lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt;A semi-supervised spectral clustering based on the&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt;p-Laplacian &lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt;graph is performed at each sub-space independently.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt; Specifically, to increase the accuracy of spectral clustering, we have used the spectral clustering method based on the p-Laplacian graph. The p-Laplacian graph is a nonlinear generalization of the Laplacian graph. The results of any clustering solutions are compared with the pairwise constraints and according to the level of matching, a degree of confidence is assigned to each clustering solution. Based on these degrees of confidence, an ensemble adjacency matrix is formed, which is the result of considering the results of all clustering solutions for each sub-space. This ensemble adjacency matrix is used in the final spectral clustering algorithm to find the clustering solution of the whole sub-space. Since the sub-spaces are generated randomly with an unequal number of features, clustering results are strongly influenced by different initial values. Therefore, it is necessary to find the optimal sub-space set. To this end, a search algorithm is designed to find the optimal sub-space set. The search process is initialized by forming several sets (we call each set an environment) consisting of several numbers of sub-spaces. An optimal environment is the one that has the best clustering results. The search algorithm utilized three search operators to find the optimal environment. The search operators search all the environments and the consequent sub-spaces both locally and globally. These operators combine two environments and/or replace an environment with a newly generated one. Each search operator tries to find the best possible environment in the entire search space or in a local space. &lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br&gt;
&lt;span style=&quot;font-size:11pt&quot;&gt;&lt;span style=&quot;line-height:normal&quot;&gt;&lt;span sans-serif=&quot;&quot; style=&quot;font-family:Calibri,&quot;&gt;&lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt;We evaluate the performance of our proposed clustering schema on 20 cancer gene datasets. The normalized mutual information (NMI) criterion and the adjusted rand index (ARI) are used to evaluate the performance evaluation.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;b&gt; &lt;/b&gt;&lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt;We first examine the effect of a different number of pairwise constraints. As expected, with increasing the number of pairwise constraints, the efficiency of the proposed method also increases. For example, the NMI value increases from 0.6 to 0.9 on the Khan-2001 dataset, when the number of pairwise constraints increases from 20 to 100. More number of pairwise constraints means more information is available, which helps to improve the performance of the clustering algorithm. Furthermore, we examine the effect of the number of random subspaces. It is observed that increasing the number of random subspaces has a positive effect on clustering performance with respect to the NMI value. In most datasets, when the number of sub-spaces reaches 20, the performance of the proposed method does not change much and is stable. Examining the effect of sampling rate for random subspace generation shows that the proposed method has the best performance in most cancer datasets, such as Armstrong-2002-v3, and Bredel-2005 datasets, when the random subspace generation rate is 0.5, and by deviating the rate from 0.5, the level of satisfaction decreases. Then, the results of the proposed idea are compared with the results of the method proposed in the reference [21] according to ARI and we see that our proposed method has performed better in 12 data sets out of 20 data sets than the method proposed in the reference [21]. Finally, the proposed idea is compared with some metric learning approaches with respect to NMI. We have observed that the proposed method obtained the best results compared to other compared methods on 11 datasets out of 20 datasets. It also achieved the second-best result on 6 out of 20 datasets. For example, the value NMI obtained in the proposed method is 0.1042 more than the reference [21] and it is 0.1846 more than RCA and it is 0.4 more than ITML and also it is 0.468 more than DCA on the Bredel-2005 dataset. &lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br&gt;
&lt;span style=&quot;font-size:11pt&quot;&gt;&lt;span style=&quot;line-height:normal&quot;&gt;&lt;span sans-serif=&quot;&quot; style=&quot;font-family:Calibri,&quot;&gt;&lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt;Utilizing ensemble clustering methods besides the confidence factor improves the ability of the proposed algorithm to achieve better results. Also, utilizing the transitive operators as well as the selection of random subspaces of unequal sizes play an important role in achieving better performance for the proposed algorithm.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt; &lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span new=&quot;&quot; roman=&quot;&quot; style=&quot;font-family:&quot; times=&quot;&quot;&gt;Using the p-Laplacian spectral clustering method produces a better, more balanced, and normal volume of clusters compared to the standard spectral clustering. Another effective approach to the performance of the proposed method is to use search operators to find the best subspace, which leads to better results.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;b&gt;&lt;span style=&quot;font-size:10.0pt&quot;&gt;&lt;span style=&quot;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br&gt;
&amp;nbsp;</abstract>
	<keyword_fa>خوشه بندی, یادگیری زیرفضا, یادگیری گروهی, یادگیری نیمه نظارتی, زوج محدودیت های دوبه دو</keyword_fa>
	<keyword>Clustering, Subspace Learning, Ensemble Learning, Semi-supervised Learning, Pairwise Constraints</keyword>
	<start_page>39</start_page>
	<end_page>58</end_page>
	<web_url>http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-588-1&amp;slc_lang=fa&amp;sid=1</web_url>


<author_list>
	<author>
	<first_name>Sedigheh</first_name>
	<middle_name></middle_name>
	<last_name>Safari</last_name>
	<suffix></suffix>
	<first_name_fa>صدیقه</first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa>صفری</last_name_fa>
	<suffix_fa></suffix_fa>
	<email>s.safari@eng.uk.ac.ir</email>
	<code>100319475328460012397</code>
	<orcid>100319475328460012397</orcid>
	<coreauthor>No</coreauthor>
	<affiliation></affiliation>
	<affiliation_fa>دانشگاه شهید باهنر کرمان</affiliation_fa>
	 </author>


	<author>
	<first_name>Fatemeh</first_name>
	<middle_name></middle_name>
	<last_name>Afsari</last_name>
	<suffix></suffix>
	<first_name_fa>فاطمه</first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa>افسری</last_name_fa>
	<suffix_fa></suffix_fa>
	<email>afsari.f@gmail.com</email>
	<code>100319475328460012398</code>
	<orcid>100319475328460012398</orcid>
	<coreauthor>Yes
</coreauthor>
	<affiliation></affiliation>
	<affiliation_fa>دانشگاه شهید باهنر کرمان</affiliation_fa>
	 </author>


</author_list>


	</article>
</articleset>
</journal>
