Presenting a Method based on Genetic Algorithm for finding the most Stable Clusters in Ensemble Clustering

Samimi, Navid; Nejatian, Samad; Parvin, Hamid; Bagheri Fard, Karamolah; Rezaei, Vahideh

doi:10.61186/jsdp.21.3.111

Volume 21, Issue 3 (12-2024) JSDP 2024, 21(3): 111-136 | Back to browse issues page

‎ 10.61186/jsdp.21.3.111

Mendeley

Zotero

RefWorks

Samimi N, Nejatian S, Parvin H, Bagheri Fard K, Rezaei V. Presenting a Method based on Genetic Algorithm for finding the most Stable Clusters in Ensemble Clustering. JSDP 2024; 21 (3) : 6
URL: http://jsdp.rcisp.ac.ir/article-1-1217-en.html

Presenting a Method based on Genetic Algorithm for finding the most Stable Clusters in Ensemble Clustering

Navid Samimi

, Samad Nejatian

, Hamid Parvin ^*

, Karamolah Bagheri Fard

, Vahideh Rezaei

Department of Computer Engineering, Nourabad Mamasani Branch, Islamic Azad University, Yasuj, Iran

Abstract: (905 Views)

Clustering is one of the fundamental tools in data analysis and data mining, enabling the extraction of hidden and meaningful structures from large datasets by grouping data based on intrinsic similarities. However, selecting optimal clusters in conventional clustering algorithms poses challenges, especially when clusters are dense or heterogeneous. In this study, a novel genetic algorithm-based method is proposed to identify the most stable clusters in ensemble clustering. By leveraging cluster stability criteria and a correlation matrix, the proposed approach improves the accuracy and stability of the final clustering results. The proposed method involves generating initial partitions of the data using six different clustering algorithms. Next, the Fisher criterion is applied to identify more stable clusters. These selected clusters are then evaluated and optimized using a genetic algorithm to construct an optimized correlation matrix. This matrix is subsequently fed into a hierarchical clustering algorithm, which produces the final consensus clustering. The proposed method was tested on standard datasets. Results demonstrated improvements of 12% and 5% in NMI and ARI metrics, respectively, compared to previous methods. The use of a genetic algorithm enabled the identification of clusters with higher stability and diversity, reducing the impact of noise and increasing the accuracy of the final clustering. Moreover, the method outperformed individual base clustering algorithms in providing more precise clustering results. Due to its ability to enhance the accuracy and stability of clustering, the proposed method holds potential for applications in domains such as big data analysis, machine learning, and information retrieval. The use of the Fisher criterion for selecting stable clusters and genetic algorithms for optimization are among the strengths of this research. This method not only preserves diversity among clusters but also significantly enhances clustering accuracy. Future studies could explore the combination of this approach with more advanced algorithms to assess its applicability to more complex datasets.

Article number: 6

Keywords: Ensemble clustering, Cluster Stability, Fisher Criterion, Correlation matrix, Genetic Algorithm

Full-Text [PDF 1669 kb] (331 Downloads)

Type of Study: Research | Subject: Paper
Received: 2021/03/19 | Accepted: 2024/02/25 | Published: 2025/01/17 | ePublished: 2025/01/17

References

1. W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, "Knowledge discovery in databases: An overview", AI Magazine, vol. 13, no. 3, p. 57, 1992.

2. D. J. Hand, H. Mannila, and P. Smyth, "Principles of Data Mining", MIT Press, 2001.

3. I. H. Witten and E. Frank, "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, 2005.

4. A. K. Jain and R. C. Dubes, "Algorithms for Clustering", Englewood Cliffs, NJ: Prentice Hall, 1988.

5. H. Frigui and R. Krishnapuram, "A robust competitive clustering algorithm with applications in computer vision," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 21, pp. 455-465, 1999. [DOI:10.1109/34.765656]

6. M. R. Anderberg, "Cluster Analysis for Applications", Academic Press, Inc., New York, 1973.

7. E. Diday and J. C. Simon, "Clustering analysis", Digital Pattern Recognition, K. S. Fu, Ed., Springer-Verlag, New York, pp. 47-94, 1976. [DOI:10.1007/978-3-642-96303-2_3]

8. R. S. Michalski and R. E. Stepp, "Automated construction of classification: Conceptual clustering versus numerical taxonomy", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, pp. 396-409, 1983. [DOI:10.1109/TPAMI.1983.4767409] [PMID]

9. R. C. Dubes, "Cluster analysis and related issues", Handbook of Pattern Recognition & Computer Vision,World Scientific, 1993. [DOI:10.1142/9789814343138_0001]

10. S. Saha and S. Bandyopadhyay, "Application of multiobjective optimization for data clustering", Seminar (Machine Intelligence Unit, Indian Statistical Institute), Kolkata, India, 2008.

11. N. Jardin and R. Sibson, "Mathematical Taxonomy", Wiley, New York, 1971.

12. N. Jardin and C. J. Van Rijsbergen, "The use of hierarchical clustering in information retrieval", Information Storage and Retrieval, vol. 7, pp. 217-240, 1971. [DOI:10.1016/0020-0271(71)90051-9]

13. M. Mojarad, S. Nejatian, H. Parvin, and M. Mohammadpoor, "A fuzzy clustering ensemble based on cluster clustering and iterative fusion of base clusters", Applied Intelligence, vol. 49, pp. 2567-2581, 2019. [DOI:10.1007/s10489-018-01397-x]

14. M. N. Ghaemi, "A survey: Clustering ensembles techniques", World Academy of Science, Engineering and Technology, pp. 636-646, 2009.

15. H. Parvin, B. Miaei-Bidgoli, H. Alinejad-Rokny, and W. H. Punch, "Data weighing mechanisms for clustering ensembles", Computers & Electrical Engineering, vol. 39, no. 5, pp. 1433-1450, 2013. [DOI:10.1016/j.compeleceng.2013.02.004]

16. H. Parvin and B. Miaei-Bidgoli, "A clustering ensemble framework based on elite selection of weighted clusters," Advances in Data Analysis and Classification, vol. 7, no. 2, pp. 181-208, 2013. [DOI:10.1007/s11634-013-0130-x]

17. A. Topchy, B. Minaei-Bidgoli, and W. F. Punch, "Ensembles of partitions via data resampling," Proceedings of the International Conference on Information Technology, ITCC 04, Las Vegas, 2004. [DOI:10.1109/ITCC.2004.1286629]

18. A. Topchy, A. K. Jain, and W. F. Punch, "Combining multiple weak clusterings", Proceedings of the 3rd IEEE International Conference on Data Mining, 331-338, 2003. [DOI:10.1109/ICDM.2003.1250937]

19. A. Strehl and J. Ghosh, "Cluster ensembles - A knowledge reuse framework for combining multiple partitions", Journal of Machine Learning Research, vol. 3, pp. 583-617, 2002.

20. R. Srikant and R. Agrawal, "Mining quantitative association rules in large relational tables", Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, 1996. [DOI:10.1145/233269.233311]

21. J. W. Chang and D. S. Jin, "A new cell-based clustering method for large-high dimensional data in data mining applications", Proceedings of the ACM Symposium on Applied Computing, 503-507, 2002. [DOI:10.1145/508791.508886]

22. R. Miller and Y. Yang, "Association rules over interval data", Proceedings of the ACM SIGMOD International Conference on Management of Data, 452-461, 1997. [DOI:10.1145/253260.253361] [PMID] []

23. R. Jiamthapthaksin, C. F. Eick, and S. Lee, "GAC-GEO: A generic agglomerative clustering framework for geo-referenced datasets," Knowledge and Information Systems, vol. 29, no. 3, pp. 597-628, 2011. [DOI:10.1007/s10115-010-0355-3]

24. S. Dudoit and J. Fridlyand, "Bagging to improve the accuracy of a clustering procedure", Bioinformatics, vol. 19, no. 9, pp. 1090-1099, 2003. [DOI:10.1093/bioinformatics/btg038] [PMID]

25. M. H. F. Zarandi, M. R. Faraji, and M. Karbasian, "An exponential cluster validity index for fuzzy clustering with crisp and fuzzy data," Scientia Iranica, vol. 17, no. 2, pp. 95-110, 2010.

26. J. C. Bezdek, "Cluster validity with fuzzy sets", Journal of Cybernetics, vol. 3, no. 3, pp. 58-73, 1973. [DOI:10.1080/01969727308546047]

27. F. Kovács, C. Legány, and A. Babos, "Cluster validity measurement techniques", Department of Automation and Applied Informatics, Budapest University of Technology and Economics, 2003.

28. Y. Zhang, W. Wang, X. Zhang, and Y. Li, "A cluster validity index for fuzzy clustering", Information Sciences, vol. 178, no. 4, pp. 1205-1218, 2008. [DOI:10.1016/j.ins.2007.10.004]

29. W. Wang and Y. Zhang, "fuzzy cluster validity indices", Fuzzy Sets and Systems, vol. 158, no. 19, pp. 2095-2117, 2007. [DOI:10.1016/j.fss.2007.03.004]

30. M. J. A. Berry and G. Linoff, "Data Mining Techniques for Marketing", Sales and Customer Support, 3rd ed., John Wiley & Sons, Inc., USA, 1996.

31. M. Berry, S. Dumais, G. O'Brien, "Using linear algebra for intelligent information retrieval," SIAM Review, vol. 37, no. 4, pp. 573-595, 1995. [DOI:10.1137/1037127]

32. M. U. Fayyad, G. Piatesky-Shapiro, and P. Smuth, "Advances in Knowledge Discovery and Data Mining", AAAI Press, 1996.

33. M. R. Razme Rezaee, B. P. F. Leleiveldt, and J. H. C. Reiber, "A new cluster validity index for the fuzzy c-mean", Pattern Recognition Letters, vol. 19, no. 3-4, pp. 237-246, 1998. [DOI:10.1016/S0167-8655(97)00168-2]

34. A. Strehl and J. Ghosh, "Cluster ensembles - A knowledge reuse framework for combining multiple partitions", Journal of Machine Learning Research, vol. 3, no. 1, pp. 583-617, 2003.

35. A. Fred and A. K. Jain, "Combining multiple clustering using evidence accumulation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 835-850, 2005. [DOI:10.1109/TPAMI.2005.113] [PMID]

36. N. Iam-On, T. Boongoen, S. Garrett, and C. Price, "A link-based approach to the cluster ensemble problem", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 11, pp. 2396-2409, 2011. [DOI:10.1109/TPAMI.2011.84] [PMID]

37. A. Strehl and J. Ghosh, "Cluster ensembles: A knowledge reuse framework for combining multiple partitions", Journal of Machine Learning Research, vol. 3, pp. 583-617, 2002.

38. Z. Zhou and W. Tang, "Clusterer ensemble", Knowledge-Based Systems, vol. 19, pp. 77-83, 2006. [DOI:10.1016/j.knosys.2005.11.003]

39. N. Nguyen and R. Caruana, "Consensus clusterings," Proceedings of the Seventh IEEE International Conference on Data Mining, Omaha, NE, USA, 28-31 October 2007, pp. 607-612. [DOI:10.1109/ICDM.2007.73]

40. A. Y. Ng, M. I. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an algorithm", Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, USA, vol. 14, 2002.

41. M. Ester, H. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise", KDD-96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, CA, USA, 1996, pp. 226-231.

42. A. Rodriguez and A. Laio, "Clustering by fast search and find of density peaks", Science, vol. 344, no. 6191, pp. 1492-1496, 2014. [DOI:10.1126/science.1242072] [PMID]

43. T. Alqurashi and W. Wang, "Clustering ensemble method," International Journal of Machine Learning and Cybernetics, vol. 10, no. 6, pp. 1227-1246, 2019. [DOI:10.1007/s13042-017-0756-7]

44. A. Tombros, R. Villa, and C. Rijsbergen, "The effectiveness of query-specific hierarchic clustering in information retrieval", Information Processing & Management, vol. 38, no. 4, pp. 559-582, 2002. [DOI:10.1016/S0306-4573(01)00048-6]

45. M. R. Valizadeh and M. Zolghadri-Jahromi, "A proposed query-sensitive similarity measure for information retrieval", Iranian Journal of Science & Technology, Transaction B, Engineering, vol. 30, no. B2, 2006.

46. A. Topchy, A. K. Jain, and W. Punch, "Clustering ensembles: Models of consensus and weak partitions", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1689-1704, 2005. [DOI:10.1109/TPAMI.2005.237] [PMID]

47. X. Z. Fern and W. Lin, "Cluster ensemble selection", Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 1, no. 3, pp. 128-141, 2008. [DOI:10.1002/sam.10008]

48. A. L. Fred and A. K. Jain, "Combining multiple clustering using evidence accumulation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 835-850, 2005. [DOI:10.1109/TPAMI.2005.113] [PMID]

49. I. T. Christou, "Coordination of cluster ensembles via exact methods", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 2, pp. 279-293, 2011. [DOI:10.1109/TPAMI.2010.85] [PMID]

50. V. Singh, L. Mukherjee, J. Peng, and J. Xu, "Ensemble clustering using semi-definite programming with applications", Machine Learning, vol. 79, no. 1-2, pp. 177-200, 2010. [DOI:10.1007/s10994-009-5158-y] [PMID] []

51. Sh. Zhou, R. Duan, Zh. Chen, W. Song, "Weighted ensemble clustering with multivariate randomness and random", Applied Soft Computing Journal, vol. 150, p. 111015, 2024. [DOI:10.1016/j.asoc.2023.111015]

52. Z. Bian, J. Qu, J. Zhou, Zh. Jiang, Sh. Wang, "Weighted adaptively ensemble clustering method based on fuzzy Co-association matrix", Information Fusion, vol. 103, p. 102099, 2024. [DOI:10.1016/j.inffus.2023.102099]

53. B. Shen, J. Jiang, F. Qian, D. Li, Y. Ye, Gh. Ahmadi, "Semi-supervised hierarchical ensemble clustering based on an innovative distance metric and constraint information", Engineering Applications of Artificial Intelligence, vol. 124, p. 106571, 2023. [DOI:10.1016/j.engappai.2023.106571]

54. Y. Wu, R. Wu, J. Liu, X. Tang, "MetaWCE: Learning to Weight for Weighted Cluster Ensemble", Information Sciences, vol. 629, pp. 39-61, 2023. [DOI:10.1016/j.ins.2023.01.135]

55. J. Xu, T. Li, D. Zhang, J. Wu, "Ensemble clustering via fusing global and local structure information", Expert Systems with Applications, vol. 237, p. 121557, 2024. [DOI:10.1016/j.eswa.2023.121557]

56. Q. Gu, Y. Wang, P. Wang, X. Li, L. Chen, N. N. Xiong, D. Liu, "An improved weighted ensemble clustering based on two-tier uncertainty measurement", Expert Systems With Applications, vol. 238, p. 121672, 2024. [DOI:10.1016/j.eswa.2023.121672]

57. J. Xua, T. Lia, "Ensemble clustering with low-rank optimal Laplacian matrix learning", Applied Soft Computing, 2023, doi: 10.1016/j.asoc.2023.111095. [DOI:10.1016/j.asoc.2023.111095]

58. D. Aktaş, B. Lokman, T. İnkaya, G. Dejaegere, "Cluster ensemble selection and consensus clustering: A multi-objective optimization approach", European Journal of Operational Research, 2023, doi: 10.1016/j.ejor.2023.10.029. [DOI:10.1016/j.ejor.2023.10.029]

59. X. Niu, Ch. Zhang, X. Zhao, L. Hu, J. Zhang, "A multi-view ensemble clustering approach using joint affinity matrix", Expert Systems With Applications, vol. 216, p. 119484, 2023. [DOI:10.1016/j.eswa.2022.119484]

60. Y. Jia, S. Tao, R. Wang, Y. Wang, "Ensemble Clustering via Co-Association Matrix Self-Enhancement", IEEE Transactions on Neural Networks and Learning Systems, 2023, ISSN: 2162-2388.

61. رضایی، حمید و دانشپور، نگین، "ارائه روشی جدید برای خوشه بندی داده های مخلوط بر مبنای تعداد ویژگی مشابه"، مجله پردازش علائم و داده ها، جلد ۲۱ شماره ۱ صفحات ۵۲-۳۹، 1403.

62. نجفی، فاطمه و پروین، حمید و میرزایی، کمال و نجاتیان، صمد و رضایی، سیده وحیده ، "یک روش خوشه‌بندی ترکیبی جدید مبتنی بر خوشه‌بند cmeans فازی با حفظ تنوع در اجماع"، مجله پردازش علائم و داده ها، جلد 17 شماره 4 صفحات 122-103، 1399.

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote