A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble

najafi, fatemeh; parvin, hamid; mirzaei, kamal; nejatiyan, samad; rezaie, seyede vahideh

doi:10.29252/jsdp.17.4.103

Volume 17, Issue 4 (2-2021) JSDP 2021, 17(4): 103-122 | Back to browse issues page

‎ 10.29252/jsdp.17.4.103

Mendeley

Zotero

RefWorks

najafi F, parvin H, mirzaei K, nejatiyan S, rezaie S V. A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble. JSDP 2021; 17 (4) :103-122
URL: http://jsdp.rcisp.ac.ir/article-1-976-en.html

A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble

Fatemeh Najafi

, Hamid Parvin ^*

, Kamal Mirzaei

, Samad Nejatiyan

, Seyede vahideh Rezaie

Department of Computer Engineering, Mamasani Branch, Islamic Azad University

Abstract: (3100 Views)

An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The input of this function is all base clusters and its output is a clustering called clustering agreement. This function is called an agreement function. Ensemble clustering has been proposed to increase efficiency, strong, reliability and clustering stability. Because of the lack of cluster monitoring, and the inadequacy of general-purpose base clustering algorithms on the other, a new approach called an ensemble clustering has been proposed in which it has been attempted to find an agreed cluster with the highest Consensus and agreement. In fact, ensemble clustering techniques with this slogan, the combination of several poorer models, is better than a strong model. However, this claim is correct if certain conditions (such as the diversity between the members in the consensus and their quality) are met. This article presents an ensemble clustering method. This paper uses the weak clustering method of fuzzy cmeans as a base cluster. Also, by adopting some measures, the diversity of consensus has increased. The proposed hybrid clustering method has the benefits of the clustering algorithm of fuzzy cmeans that has its speed, as well as the major weaknesses of the inability to detect non-spherical and non-uniform clusters. In the experimental results, we have tested the proposed ensemble clustering algorithm with different, up-to-date and robust clustering algorithms on the different data sets. Experimental results indicate the superiority of the proposed ensemble clustering method compared to other clustering algorithms to up-to-date and strong.

Keywords: Ensemble Learning, Ensemble Clustering, Fuzzy Cmeans Clustering Algorithm, Data Validity

Full-Text [PDF 7743 kb] (767 Downloads)

Type of Study: Research | Subject: Paper
Received: 2019/02/19 | Accepted: 2020/09/27 | Published: 2021/02/22 | ePublished: 2021/02/22

References

1. [1] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.

2. [2] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice Hall, 1988.

3. [3] A.K. Jain, ''Data clustering: 50 years beyond Kmeans'', Pattern Recogni¬tion Letters, vol. 31, no. 8, pp. 651-666, 2010. [DOI:10.1016/j.patrec.2009.09.011]

4. [4] J.B. MacQueen, "Some methods for classification and analysis of multivariate observations". Proc. of 5-th Berkeley Symposium on Math¬ematical Statistics and Probability, Berkeley, University of California Press, vol. 1, pp. 281-297, 1967.

5. [5] A. Likas, M. Vlassis, J. Verbeek, "The global fc-means clustering algorithm", Pattern Recognition, vol. 35, no. 2, pp. 451-461, 2003. [DOI:10.1016/S0031-3203(02)00060-2]

6. [6] M. Ester, H. Kriegel, J. Sander, X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise", In Evangelos Simoudis, Jiawei Han, Usama M. Fayyad, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, 1996, pp. 226-231.

7. [7] A. Rodriguez, A. Laio, "Clustering by fast search and find of density peaks", Science, vol. 344, no. 6191, pp. 1492-1496, 2014. [DOI:10.1126/science.1242072] [PMID]

8. [8] J. Shi, J. Malik, "Normalized cuts and image segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, 2000. [DOI:10.1109/34.868688]

9. [9] A.Y. Ng, M.I. Jordan, Y. Weiss, ''On Spectral Clustering: Analysis and an Algorithm, in: T.G. Dietterich, S. Becker, Z. Ghahramani (Eds.)'', Advances in Neural Information Processing Systems, vol. 14, MIT Press, Cambridge, MA, 2002.

10. [10] A. Strehl, J. Ghosh, "Cluster ensembles: a knowledge reuse framework for combining multiple partitions", Journal on Machine Learning Re¬search, vol. 3, pp. 583-617, 2002.

11. [11] A. Gionis, H. Mannila, P. Tsaparas, "Clustering aggregation, ACM Transactions on Knowledge Discovery from Data", vol. 1, no. 1, pp. 1-30, 2007. [DOI:10.1145/1217299.1217303]

12. [12] Z. Zhou, Ensemble Methods: Foundations and Algorithms, CRC Press, 2012. [DOI:10.1201/b12207]

13. [13] E. Gonzlez, J. Turmo, "Unsupervised ensemble minority clustering", Machine Learning, vol.98, pp. 217-268, 2015. [DOI:10.1007/s10994-013-5394-z]

14. [14] N. Iam-On, T. Boongoen, "Comparative study of matrix refinement approaches for ensemble clustering", Machine Learning, vol. 98, pp. 269-300, 2015. [DOI:10.1007/s10994-013-5342-y]

15. [15] A. Fred, A. Jain, "Combining multiple clusterings using evidence accumulation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp.835-850, 2005. [DOI:10.1109/TPAMI.2005.113] [PMID]

16. [16] L. Kuncheva, D. Vetrov, "Evaluation of stability of kmeans cluster ensembles with respect to random initialization", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1798¬1808, 2006. [DOI:10.1109/TPAMI.2006.226] [PMID]

17. [17] X. Zhang, L. Jiao, F. Liu, L. Bo, M. Gong. "Spectral clustering ensemble applied to SAR image segmentation", IEEE Transactions on Geoscience and Remote Sensing, vol. 46, no. 7, pp. 2126-2136, 2008. [DOI:10.1109/TGRS.2008.918647]

18. [18] M. Law, A. Topchy, A. Jain, "Multiobjective data clustering", Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.

19. [19] Z. Yu, H. Chen, J. You, et al, "Hybrid fuzzy cluster ensemble framework for tumor clustering from bio-molecular data", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 3, pp. 657¬670, 2013. [DOI:10.1109/TCBB.2013.59] [PMID]

20. [20] B. Fischer, J. Buhmann, "Bagging for path-based clustering", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 11, pp. 1411-1415, 2003. [DOI:10.1109/TPAMI.2003.1240115]

21. [21] A. Topchy, B. Minaei-Bidgoli, A. Jain, "Adaptive clustering ensembles", Proc. the 17th International Conference on Pattern Recognition, 2004. [DOI:10.1109/ICPR.2004.1334105]

22. [22] Z. Zhou, W. Tang, "Clusterer ensemble", Knowledge-Based Systems, vol. 19, no. 1, pp. 77-83, 2006. [DOI:10.1016/j.knosys.2005.11.003]

23. [23] Y. Hong, S. Kwong, H. Wang, Q. Ren, "Resampling-based selective clus¬tering ensembles", Pattern Recognition Letters, vol. 41(9), pp. 2742-2756, 2009. [DOI:10.1016/j.patcog.2008.03.007]

24. [24] X. Fern, C. Brodley, "Random projection for high dimensional data clus¬tering: A cluster ensemble approach", Proc. International Conference on Machine Learning, 2003.

25. [25] P. Zhou, L. Du, L. Shi, H. Wang et al., "Learning a robust consensus matrix for clustering ensemble via kullback-leibler divergence mini-mization", Proc. the 25th International Joint Conference on Artificial Intelligence, 2015.

26. [26] Z. Yu, L. Li, J. Liu et al., "Adaptive noise immune cluster ensemble using affinity propagation", IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 19, pp. 3176-3189, 2015. [DOI:10.1109/TKDE.2015.2453162]

27. [27] F. Gullo, C. Domeniconi, "Metacluster-based projective clustering en¬sembles", Machine Learning, vol. 98, no. 1-2, pp. 1-36, 2013. [DOI:10.1007/s10994-013-5395-y]

28. [28] Y. Yang, J. Jiang, "Hybrid Sampling-Based Clustering Ensemble with Global and Local Constitutions", Ieee Transactions on Neural Networks and Learning Systems, vol. 27, no. 5, pp. 952-965, 2016. [DOI:10.1109/TNNLS.2015.2430821] [PMID]

29. [29] A. Fred, A. K. Jain, "Data clustering using evidence accumulation", Proc. the 16th International Conference on Pattern Recognition, , 2002, pp. 276-280.

30. [30] Y. Yang, K. Chen, "Temporal data clustering via weighted clustering ensemble with different representations", IEEE Transactions on Knowl-edge and Data Engineering, vol. 23, no. 2, pp. 307-320, 2011. [DOI:10.1109/TKDE.2010.112]

31. [31] N. Iam-On, T. Boongoen, S. Garrett, C. Price, "A link-based approach to the cluster ensemble problem", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 2396-2409, 2011. [DOI:10.1109/TPAMI.2011.84] [PMID]

32. [32] N. Iam-On, T. Boongoen, S. Garrett, C. Price, "A link-based cluster ensemble approach for categorical data clustering", IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 3, pp. 413-425, 2012. [DOI:10.1109/TKDE.2010.268]

33. [33] X. Fern, C. Brodley, "Solving cluster ensemble problems by bipartite graph partitioning", Proc. of the 21st International Conference on Machine Learning, 2004. [DOI:10.1145/1015330.1015414]

34. [34] D. Huang, J. Lai, C. D. Wang, "Ensemble clustering using factor graph", Pattern Recognition, vol. 50, pp. 131-142, 2016. [DOI:10.1016/j.patcog.2015.08.015]

35. [35] M. Selim, E. Ertunc, "Combining multiple clusterings using similarity graph", Pattern Recognition, vol. 44, no. 3, 694-703, 2011. [DOI:10.1016/j.patcog.2010.09.008]

36. [36] C. Boulis, M. Ostendorf, "Combining multiple clustering systems", Proc. European Conf. Principles and Practice of Knowledge Discovery in Databases, 2004. [DOI:10.1007/978-3-540-30116-5_9]

37. [37] A. Topchy, B. Minaei-Bidgoli, A. Jain, "Adaptive clustering ensembles", Proc. the 17th International Conference on Pattern Recognition, 2004. [DOI:10.1109/ICPR.2004.1334105]

38. [38] P. Hore, L. O. Hall, B. Goldgo, "A scalable framework for cluster ensembles", Pattern Recognition, vol. 42, no. 5, 676-688, 2009. [DOI:10.1016/j.patcog.2008.09.027] [PMID] [PMCID]

39. [39] B. Long, Z. Zhang, P. S. Yu, "Combining multiple clusterings by soft correspondence", Proc. the 4th IEEE International Conference on Data Mining, 2005.

40. [40] D. Cristofor, D. Simovici, "Finding median partitions using information theoretical based genetic algorithms", J. Universal Computer Science, vol. 8, no. 2, pp. 153-172, 2002.

41. [41] A. Topchy, A. Jain, W. Punch, "Clustering ensembles: Models of consensus and weak partitions", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 12, 1866-1881, 2005. [DOI:10.1109/TPAMI.2005.237] [PMID]

42. [42] H. Wang, H. Shan, A. Banerjee, "Bayesian cluster ensembles", Statisti¬cal Analysis and Data Mining, vol. 4, no. 1, pp. 54-70, 2011. [DOI:10.1002/sam.10098]

43. [43] Z. He, X. Xu, S. Deng, "A cluster ensemble method for clustering categorical data", Information Fusion, vol. 6, no. 2, pp. 143-151, 2005. [DOI:10.1016/j.inffus.2004.03.001]

44. [44] N. Nguyen, R. Caruana, "Consensus Clusterings", Proc. IEEE Intl Conf. Data Mining, 2007, pp. 607-612. [DOI:10.1109/ICDM.2007.73]

45. [45] Z. Huang, "Extensions to the kmeans algorithm for clustering large data sets with categorical values", Data Mining and Knowledge Discovery, vol. 2, no. 3, pp. 283-304, 1998. [DOI:10.1023/A:1009769707641]

46. [46] S. Abbasi, S. Nejatian, H. Parvin, V. Rezaie &K. Bagherifard, "Clustering ensemble selection considering quality and diversity," Artificial Intelligence Review, vol. 52, pp. 1311-1340, Springer Nature B.V. 2018, https://doi.org/10.1007/s10462-018-9642-2 [DOI:10.1007/s10462-018-9642-2.]

47. [47] A. Bagherinia, B. Minaei-Bidgoli, M. Hossinzadeh, H. Parvin, "Elite fuzzy clustering ensemble based on clustering diversity and quality measures," Springer Science+Business Media, LLC, part of Springer Nature, Applied Intelligence, vol.49 , PP. 1724-1747, 2019. https://doi.org/10.1007/s10489-018-1332-x [DOI:10.1007/s10489-018-1332-x.]

48. [48] A. Nazari, A. Dehghan, S Nejatian, V. Rezaie, H. Parvin, "A comprehensive study of clustering ensemble weighting based on cluster quality and diversity," Pattern Analysis and Applications, vol. 22, pp.133-145, 2019. [DOI:10.1007/s10044-017-0676-x]

49. [49] S. Guha, R. Rastogi, K. Shim, "Cure: an efficient clustering algorithm for large databases", Proc. of the Conference on Management of Data (ACM SIGMOD), pp.73-84, 1998. [DOI:10.1145/276305.276312]

50. [50] P.H.A. Sneath, R.R. Sokal, Numerical Taxonomy, Freeman, San Fran¬cisco, London, 1973.

51. [51] B. King, "Step-wise clustering procedures", Journal of the American State Association, vol. 69, pp. 86-101, 1967. [DOI:10.1080/01621459.1967.10482890]

52. [52] G. Karypis, E.-H.S. Han, V. Kumar, "Chameleon: ahierarchical cluster¬ing algorithm using dynamic modeling", IEEE Computer, vol. 32, no. 8, pp. 68-75, 1999. [DOI:10.1109/2.781637]

53. [53] J.C. Bezdek, N. R. Pal, "Some new indexes of cluster validity", IEEE Transactions on Systems Man and Cybernetics Part B, vol. 28, no. 3, pp. 301-15, 1998. [DOI:10.1109/3477.678624] [PMID]

54. [54] N.R. Pal, J.C. Bezdek, "On cluster validity for the fuzzy c-means model", IEEE Transactions on Fuzzy Systems, vol. 3, no. 3, pp. 370-379, 1995. [DOI:10.1109/91.413225]

55. [55] UCI Machine Learning Repository, http://www.ics.uci.edu /mlearn /ML- Repository.html, 2016.

56. [56] T. S. A. V. W. T. Press, W. H. and B. P. Flannery, Conditional Entropy and Mutual Information. Numerical Recipes: The Art of Scientific computing (3rd ed), New York: Cambridge University Press, 2007.

57. [57] F. Rashidi, S. Nejatian, H. Parvin, V. Rezaie, "Diversity based cluster weighting in cluster ensemble: an information theory approach," Artificial Intelligence Review, vol. 52, pp.1341-1368, 2019. [DOI:10.1007/s10462-019-09701-y]

58. [58] F. Najafi, H. Parvin, K. Mirzaie, S. Nejatian, V. Rezaie, "Dependability-based cluster weighting in clustering ensemble," Stat Anal Data Min: The ASA Data Sci Journal, vol. 13, pp. 151-164, 2020. [DOI:10.1002/sam.11451]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote