Using Simulated Annealing algorithm to improve ensemble clustering

Rashidi, Froozan; Nejatian, Samad; Parvin, Hamid; Rezaei, vahideh; Bagheri Fard, Karamolah

doi:10.61186/jsdp.20.1.99

Volume 20, Issue 1 (6-2023) JSDP 2023, 20(1): 99-122 | Back to browse issues page

‎ 10.61186/jsdp.20.1.99

Mendeley

Zotero

RefWorks

Rashidi F, Nejatian S, Parvin H, Rezaei V, Bagheri Fard K. Using Simulated Annealing algorithm to improve ensemble clustering. JSDP 2023; 20 (1) : 6
URL: http://jsdp.rcisp.ac.ir/article-1-1219-en.html

Using Simulated Annealing algorithm to improve ensemble clustering

Froozan Rashidi

, Samad Nejatian

, Hamid Parvin ^*

, Vahideh Rezaei

, Karamolah Bagheri Fard

Department of Computer Engineering, Islamic Azad University of Noorabad Mamasani, Fars, Iran

Abstract: (1312 Views)

Data clustering is one of the main tasks of data mining, which is responsible for exploring hidden patterns in unlabeled data. Due to the complexity of the problem and the weakness of the basic clustering methods, today most of the studies are directed towards clustering ensemble methods. Although for most datasets, there are individual clustering algorithms that provide acceptable results, but the ability of a single clustering algorithm is limited. In fact, the main purpose of clustering ensemble is to search for better and more stable results, using the combination of information and results obtained from several initial clustering. In this paper, a clustering ensemble-based method will be proposed, which, like most evidence accumulation methods, has two steps: 1- building a simultaneous participation matrix and 2- determining the final output from the proposed participation matrix. In the proposed method, some other information will be used in addition to the clustering of the samples to construct the simultaneous participation matrix. This information can be related to the degree of similarity of the samples, the size of the initial clusters, the degree of stability of the initial clusters, etc. In this paper, the clustering problem is defined as an explicit optimization problem by the mixed Gaussian model and is solved using the simulated annealing algorithm. Also, an evolutionary method based on simulated annealing will be presented to determine the final output from the proposed simultaneous participation matrix. The most important part of the evolutionary method is to determine the objective function that guarantees the final output will be of high quality. The experimental results show that the proposed method is better than other similar methods in terms of different clustering quality evaluation criteria.

Article number: 6

Keywords: Clustering ensemble, Gaussian mixture model, simulated annealing algorithm, simultaneous participation matrix, stability, objective function.

Full-Text [PDF 3181 kb] (477 Downloads)

Type of Study: Research | Subject: Paper
Received: 2021/03/25 | Accepted: 2023/06/2 | Published: 2023/08/13 | ePublished: 2023/08/13

References

1. [2] S. Abbasi, S. Nejatian, H. Parvin, Karamollah Bagherifard and V. Rezaie, " The ensemble clustering with maximize diversity using evolutionary optimization algorithms", Signal and Data Processing, Vol. 4, No. 54, pp. 95-120, 2023.

2. [3] M. Z. Rodriguez, C. H. Comin, D. Casanova, O. M. Bruno, D. R. Amancio, L. D. F. Costa, and F. A. Rodrigues, ''Clustering algorithms:Acomparative approach,'' PLoS ONE, vol. 14, no. 1, Jan. 2019, Art. no. e0210236. [DOI:10.1371/journal.pone.0210236] [PMID] []

3. [4] A. Ahmad and S. S. Khan, ''Survey of state-of-the-art mixed data clustering lgorithms,'' IEEE Access, vol. 7, pp. 31883_31902, 2019. [DOI:10.1109/ACCESS.2019.2903568]

4. [5] J. Kim, J. Yoon, E. Park, and S. Choi, ''Patent document clustering with deep embeddings,'' Scientometrics, vol. 123, no. 2, pp. 563_577, May 2020. [DOI:10.1007/s11192-020-03396-7]

5. [6] E. Zanboori, M. Rostamy-Malkhalifeh, G.R. Jahanshahloo and N. Shoja, Calculating Super Efficiency of DMUs for Ranking Units in Data Envelopment Analysis Based on SBM Model, The Scientific World Journal, 2014. [DOI:10.1155/2014/382390] [PMID] []

6. [7] E. Zanboori, F. Hosseinzadeh Lotfi, M. Rostamy-Malkhalifeh and G.R. Jahanshahloo, Computing Relative weights in AHP and Ranked Units in the Presence of Large Dimensionality of data set based on Orthogonal Gram Schmidt Technique. Adv. Environ. Biol., 8(21), 78-81, 2014. [DOI:10.1155/2014/382390] [PMID] []

7. [8] Bertsimas, Dimitris, and John Tsitsiklis. "Simulated annealing." Statistical science 8.1 (1993): 10-15. [DOI:10.1214/ss/1177011077]

8. [9] P. Govender and V. Sivakumar, ''Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980_2019),'' Atmos. Pollut. Res., vol. 11, no. 1, pp. 40_56, Jan. 2020. [DOI:10.1016/j.apr.2019.09.009]

9. [10] C. Zong, S. Huang, E. Liu, Y. Yao, and S.-Q. Tang, ''Nowhere to hide methodology: Application of clustering fault diagnosis in the nuclea power industry,'' IEEE Access, vol. 7, pp. 179864_179879, 2019. [DOI:10.1109/ACCESS.2019.2957807]

10. [11] Seni, G. and J. Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions. 2010: Morgan and Claypool Publishers. 126. [DOI:10.1007/978-3-031-01899-2]

11. [12] Fred, A.L.N. and A.K. Jain, Combining multiple clusterings using evidence accumulation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2005. 27(6): p. 835-850. [DOI:10.1109/TPAMI.2005.113] [PMID]

12. [13] Fred, A.L.N. and A.K. Jain. Data clustering using evidence accumulation. in Pattern Recognition, 2002. Proceedings. 16th International Conference on. 2002.

13. [14] Fred, A. and A. Jain, Evidence Accumulation Clustering Based on the K-Means Algorithm, in Structural, Syntactic, and Statistical Pattern Recognition, T. Caelli, et al., Editors. 2002, Springer Berlin Heidelberg. p. 442-451. [DOI:10.1007/3-540-70659-3_46]

14. [15] Jain, A.K. and R.C. Dubes, Algorithms for clustering data. 1988: Prentice-Hall, Inc. 320.

15. [16] Duda, R.O., P.E. Hart, and D.G. Stork, Pattern Classification (2nd Edition). 2000: Wiley-Interscience.

16. [17] S. Solorio-Fernández, J. A. Carrasco-Ochoa, and J. F. Martínez-Trinidad, ''A review of unsupervised feature selection methods,'' Artif. Intell. Rev., vol. 53, no. 2, pp. 907_948, Feb. 2020. [DOI:10.1007/s10462-019-09682-y]

17. [18] I. Guyon and A. Elisseeff, ''An introduction to variable and feature selection,'' J. Mach. Learn. Res., vol. 3, pp. 1157_1182, Jan. 2003.

18. [19] G. Chandrashekar and F. Sahin, ''A survey on feature selection methods,''Comput. Electr. Eng., vol. 40, no. 1, pp. 16_28, Jan. 2014. [DOI:10.1016/j.compeleceng.2013.11.024]

19. [20] E. Hancer, B. Xue, and M. Zhang, ''A survey on feature selection approaches for clustering,'' Artif. Intell. Rev., pp. 1_27, doi: 10.1007/s10462-019-09800-w. [DOI:10.1007/s10462-019-09800-w]

20. [21] B. Kim, H. Lee, and P. Kang, ''Integrating cluster validity indices based on data envelopment analysis,'' Appl. Soft Comput., vol. 64, pp. 94_108, Mar. 2018. [DOI:10.1016/j.asoc.2017.11.052]

21. [22] D. Huang, X. Cai, and C.-D. Wang, ''Unsupervised feature selection with multi-subspace randomization and collaboration,'' Knowl. Based Syst., vol. 182, Oct. 2019, Art. no. 104856. [DOI:10.1016/j.knosys.2019.07.027]

22. [23] H. Zhang, R. Zhang, F. Nie, and X. Li, ''An ef_cient framework for unsupervised feature selection,'' Neurocomputing, vol. 366, pp. 194_207, Nov. 2019. [DOI:10.1016/j.neucom.2019.07.020]

23. [24] D. Huang, C.-D. Wang, J.-S. Wu, J.-H. Lai, and C.-K. Kwoh, ''Ultrascalable spectral clustering and ensemble clustering,'' IEEE Trans. Knowl. Data Eng., vol. 32, no. 6, pp. 1212_1226, Jun. 2020. [DOI:10.1109/TKDE.2019.2903410]

24. [25] S.-O. Abbasi, S. Nejatian, H. Parvin, V. Rezaie, and K. Bagherifard, ''Clustering ensemble selection considering quality and diversity,'' Artif. Intell. Rev., vol. 52, no. 2, pp. 1311_1340, Aug. 2019. [DOI:10.1007/s10462-018-9642-2]

25. [26] X. Wang, C. Yang, and J. Zhou, ''Clustering aggregation by probability accumulation,'' Pattern Recognit., vol. 42, no. 5, pp. 668_675, May 2009. [DOI:10.1016/j.patcog.2008.09.013]

26. [27] S. Vega-Pons and J. Ruiz-Shulcloper, ''Clustering ensemble method for heterogeneous partitions,'' in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (Lecture Notes in Computer Science), vol. 5856, E. Bayro-Corrochano and J.-O. Eklundh, Eds. Berlin, Germany: Springer, 2009, pp. 481_488. [DOI:10.1007/978-3-642-10268-4_56]

27. [28] N. Iam-on, T. Boongoen, and S. Garrett, ''LCE: A link-based cluster ensemble method for improved gene expression data analysis,''Bioinformatics, vol. 26, no. 12, pp. 1513_1519, Jun. 2010. [DOI:10.1093/bioinformatics/btq226] [PMID]

28. [29] D. Huang, J.-H. Lai, and C.-D.Wang, ''Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis,'' Neurocomputing, vol. 170, pp. 240_250, Dec. 2015. [DOI:10.1016/j.neucom.2014.05.094]

29. [30] H. Wang, Y. Yang, B. Liu, H. Fujita, "A study of graph-based system for multi-view clustering". Knowl-Based Syst 163:1009-1019, 2019. [DOI:10.1016/j.knosys.2018.10.022]

30. [31] C. Fahy, S. Yang, M. Gongora, "Ant Colony stream clustering: a fast density clustering algorithm for dynamic data streams". IEEE Trans Cybern, vol. 49, no. 6, pp. 2215-2228, 2019. [DOI:10.1109/TCYB.2018.2822552] [PMID]

31. [32] M. Mojarad, S. Nejatian, H. Parvin, M. Mohammadpoor, "A fuzzy clustering ensemble based on cluster clustering and iterative fusion of base clusters". Appl Intell, vol. 49, no.7, pp. 2567-2581, 2019. [DOI:10.1007/s10489-018-01397-x]

32. [33] T. Lai, R. Chen, C. Yang, Q. Li, H. Fujita, A. Sadri, H. Wang, "Efficient robust model fitting for multistructure data using global greedy search". IEEE Trans Cybern, vol. 50, no. 7, pp. 3294-3306, 2020. [DOI:10.1109/TCYB.2019.2900096] [PMID]

33. [34] Y. Yang, J. Jiang, Adaptive bi-weighting toward automatic initialization and model selection for HMM-based hybrid meta-clustering ensembles. IEEE Trans. Cybern, vol. 49, no. 5, pp. 1657-1668, 2018a. [DOI:10.1109/TCYB.2018.2809562] [PMID]

34. [35] Y. Yang, J. Jiang., Bi-weighted ensemble via HMM-based approaches for temporal data clustering. Pattern Recogn. vol. 76, pp. 391-403. 2018b. [DOI:10.1016/j.patcog.2017.11.022]

35. [36] A. Banerjee, A.K. Pujari, C. Rani Panigrahi, B. Pati, S. Chandan Nayak, T.H. Weng A new method for weighted ensemble clustering and coupled ensemble selection. Connect. Sci. vol. 33, no. 3 pp. 623-644, 2021. [DOI:10.1080/09540091.2020.1866496]

36. [37] M. Jafarzadegan, F. Safi-Esfahani, Z. Beheshti, Combining hierarchical clustering approaches using the PCA method. Expert Syst. Appl. vol. 137, pp. 1-10, 2019. [DOI:10.1016/j.eswa.2019.06.064]

37. [38] M. Mojarad, F. Sarhangnia, A. Rezaeipanah, H. Parvin, S. Nejatian, Modeling hereditary disease behavior using an innovative similarity criterion and ensemble clustering. Curr. Bioinform, vol. 16, no. 5, pp. 749 764, 2021 [DOI:10.2174/1574893616999210128175715]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote