Cluster ensemble selection using voting

Latifi Pakdehi, Alireza; Daneshpour, Negin

doi:10.29252/jsdp.15.4.17

Volume 15, Issue 4 (3-2019) JSDP 2019, 15(4): 17-30 | Back to browse issues page

‎ 10.29252/jsdp.15.4.17

Mendeley

Zotero

RefWorks

Latifi Pakdehi A, Daneshpour N. Cluster ensemble selection using voting. JSDP 2019; 15 (4) :17-30
URL: http://jsdp.rcisp.ac.ir/article-1-541-en.html

Cluster ensemble selection using voting

Alireza Latifi Pakdehi

, Negin Daneshpour ^*

Shahid Rajaee Teacher Training University

Abstract: (4748 Views)

Clustering is the process of division of a dataset into subsets that are called clusters, so that objects within a cluster are similar to each other and different from objects of the other clusters. So far, a lot of algorithms in different approaches have been created for the clustering. An effective choice (can combine) two or more of these algorithms for solving the clustering problem. Ensemble clustering combines results of existing clusterings to achieve better performance and higher accuracy. Instead of combining all of existing clusterings, recent decade researchers show, if only a set of clusterings is selected based on quality and diversity, the result of ensemble clustering would be more accurate. This paper proposes a new method for ensemble clustering based on quality and diversity. For this purpose, firstly first we need a lot of different base clusterings to combine them. Different base clusterings are generated by k-means algorithm with random k in each execution. After the generation of base clusterings, they are put into different groups according to their similarities using a new grouping method. So that clusterings which are similar to each other are put together in one group. In this step, we use normalized mutual information (NMI) or adjusted rand index (ARI) for computing similarities and dissimilarities between the base clustering. Then from each group, a best qualified clustering is selected via a voting based method. In this method, Cluster-validity-indices were used to measure the quality of clustering. So that all members of the group are evaluated by the Cluster-validity-indices. In each group, clustering that optimizes the most number of Cluster-validity-indices is selected. Finally, consensus functions combine all selected clustering. Consensus function is an algorithm for combining existing clusterings to produce final clusters. In this paper, three consensus functions including CSPA, MCLA, and HGPA have used for combining clustering. To evaluate proposed method, real datasets from UCI repository have used. In experiment section, the proposed method is compared with the well-known and powerful existing methods. Experimental results demonstrate that proposed algorithm has better performance and higher accuracy than previous works.

Keywords: Ensemble clustering, select member, validity index

Full-Text [PDF 13502 kb] (2330 Downloads)

Type of Study: Research | Subject: Paper
Received: 2016/12/31 | Accepted: 2019/01/9 | Published: 2019/03/8 | ePublished: 2019/03/8

References

1. [1] Fazl Ersi, Ehsan and Kazemi Noghabi, Masoud, "Clustering of Data Based on Key Identifica-ion",Journal of Signals and Data Processing (JSDP); 14 (4): 31-42; 2017. [DOI:10.29252/jsdp.14.4.31]

2. [2] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review," ACM computing surveys (CSUR), vol. 31, pp. 264-323, 1999. [DOI:10.1145/331499.331504]

3. [3] H.-P. Kriegel, P. Kröger, and A. Zimek, "Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering," ACM Transactions on Knowledge Discovery from Data (TKDD), vol .3, pp. 1, 2009. [DOI:10.1145/1497577.1497578]

4. [4] A. Strehl and J. Ghosh, "Cluster ensembles---a knowledge reuse framework for combining mul-tiple partitions," Journal of machine learning re-search, vol. 3, pp. 583-617, 2002.

5. [5] S. Monti, P. Tamayo, J. Mesirov, and T. Golub, "Consensus clustering: a resampling-based me-thod for class discovery and visualization of gene expression microarray data," Machine learning, vol. 52, pp. 91-118, 2003.

6. [6] C. C. Aggarwal and C. K. Reddy, Data cluster-ing: algorithms and applications: CRC Press, 2013.

7. [7] R. Avogadri and G. Valentini, "Fuzzy ensemble clustering based on random projections for DNA microarray data analysis," Artificial Intelligence in Medicine, vol. 45, pp. 173-183, 2009. [DOI:10.1016/j.artmed.2008.07.014] [PMID]

8. [8] S. Mimaroglu and E. Erdil, "Obtaining better quality final clustering by merging a collection of clusterings," Bioinformatics, vol. 26, pp. 2645-2646, 2010. [DOI:10.1093/bioinformatics/btq489] [PMID]

9. [9] X. Ma, W. Wan, and L. Jiao, "Spectral clustering ensemble for image segmentation," in Proceed-ings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, 2009, pp. 415-420. [DOI:10.1145/1543834.1543890]

10. [10] E. Akbari, H. M. Dahlan, R. Ibrahim, and H. Alizadeh, "Hierarchical cluster ensemble selec-tion," Engineering Applications of Artificial Intelligence, vol. 39, pp. 146-156, 2015. [DOI:10.1016/j.engappai.2014.12.005]

11. [11] A. L. Fred and A. K. Jain, "Combining multiple clusterings using evidence accumulation," IEEE transactions on pattern analysis and machine intelligence, vol. 27, pp. 835-850, 2005. [DOI:10.1109/TPAMI.2005.113] [PMID]

12. [12] A. Topchy, A. K. Jain, and W. Punch, "Clustering ensembles: Models of consensus and weak partitions," IEEE Transactions on pattern analysis and machine intelligence, vol. 27, pp. 1866-1881, 2005. [DOI:10.1109/TPAMI.2005.237] [PMID]

13. [13] V. Berikov, "Weighted ensemble of algorithms for complex data clustering," Pattern Recogni-tion Letters, vol. 38, pp. 99-106, 2014. [DOI:10.1016/j.patrec.2013.11.012]

14. [14] Y. Hong, S. Kwong, Y. Chang, and Q. Ren, "Unsupervised feature selection using cluster-ing ensembles and population based incre-mental learning algorithm," Pattern Recogni-tion, vol. 41, pp. 2742-2756, 2008. [DOI:10.1016/j.patcog.2008.03.007]

15. [15] B. Minaei-Bidgoli, A. Topchy, and W. F. Punch, "Ensembles of partitions via data re-sampling," in Information Technology: Coding and Computing, 2004: Proceedings. ITCC 2004. International Conference on, 2004, pp. 188-192. [DOI:10.1109/ITCC.2004.1286629]

16. [16] Z.-H. Zhou, J. Wu, and W. Tang, "Ensembling neural networks: many could be better than all," Artificial intelligence, vol. 137, pp. 239-263, 2002. [DOI:10.1016/S0004-3702(02)00190-X]

17. [17] X. Z. Fern and W. Lin, "Cluster ensemble selection," Statistical Analysis and Data Min-ing, vol. 1, pp. 128-141, 2008. [DOI:10.1002/sam.10008]

18. [18] X. Wang, D. Han, and C. Han, "Rough set based cluster ensemble selection," in Informa-tion Fusion (FUSION), 2013 16th International Conference on, 2013, pp. 438-444.

19. [19] J. Azimi and X. Fern, "Adaptive Cluster Ensemble Selection," in IJCAI, 2009, pp. 992-997.

20. [20] L. I. Kuncheva and S. T. Hadjitodorov, "Using diversity in cluster ensembles," in Systems, man and cybernetics, 2004 IEEE international conference on, 2004, pp. 1214-1219.

21. [21] M. C. Naldi, A. Carvalho, and R. J. Campello, "Cluster ensemble selection based on relative validity indexes," Data Mining and Knowledge Discovery, vol. 27, pp. 259-289, 2013. [DOI:10.1007/s10618-012-0290-x]

22. [22] H. Alizadeh, B. Minaei-Bidgoli, and H. Parvin, "To improve the quality of cluster ensembles by selecting a subset of base clusters," Journal of Experimental & Theoretical Artificial Intelli-gence, vol. 26, pp. 127-150, 2014. [DOI:10.1080/0952813X.2013.813974]

23. [23] B. Minaei-Bidgoli, H. Parvin, H. Alinejad-Rokny, H. Alizadeh, and W. F. Punch, "Effects of resampling method and adaptation on clustering ensemble efficacy," Artificial Intelli-gence Review, vol. 41, pp. 27-48, 2014. [DOI:10.1007/s10462-011-9295-x]

24. [24] G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for partitioning irre-gular graphs," SIAM Journal on scientific Com-puting, vol. 20, pp. 359-392, 1998. [DOI:10.1137/S1064827595287997]

25. [25] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, "Multilevel hypergraph partitioning: applications in VLSI domain," IEEE Transac-tions on Very Large Scale Integration (VLSI) Systems, vol. 7, pp. 69-79, 1999. [DOI:10.1109/92.748202]

26. [26] X. Lu, Y. Yang, and H. Wang, "Selective clustering ensemble based on covariance," in International Workshop on Multiple Classifier Systems, pp. 179-189, 2013. [DOI:10.1007/978-3-642-38067-9_16]

27. [27] L. Hubert and P. Arabie, "Comparing partitions," Journal of classification, vol. 2, pp. 193-218, 1985. [DOI:10.1007/BF01908075]

28. [28] D. A. Neumann and V. T. Norton, "Clustering and isolation in the consensus problem for partitions," Journal of classification, vol. 3, pp. 281-297, 1986. [DOI:10.1007/BF01894191]

29. [29] F. Yang, X. Li, Q. Li, and T. Li, "Exploring the diversity in cluster ensemble generation: Random sampling and random projection," Expert Systems with Applications, vol. 41, pp. 4844-4866, 2014. [DOI:10.1016/j.eswa.2014.01.028]

30. [30] J. Jia, X. Xiao, B. Liu, and L. Jiao, "Bagging-based spectral clustering ensemble selection," Pattern Recognition Letters, vol. 32, pp. 1456-1467, 2011. [DOI:10.1016/j.patrec.2011.04.008]

31. [31] J. Jia, X. Xiao, and B. Liu, "Similarity-based spectral clustering ensemble selection," in Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on, 2012, pp. 1071-1074. [DOI:10.1109/FSKD.2012.6233780]

32. [32] A. Banerjee, "Leveraging frequency and diversity based ensemble selection to consensus clustering," in Contemporary Computing (IC3), 2014 Seventh International Conference on, 2014, pp. 123-129. [DOI:10.1109/IC3.2014.6897160]

33. [33] D. L. Davies and D. W. Bouldin, "A cluster separation measure," IEEE transactions on pattern analysis and machine intelligence, pp. 224-227, 1979. [DOI:10.1109/TPAMI.1979.4766909]

34. [34] T. Caliński and J. Harabasz, "A dendrite method for cluster analysis," Communications in Statistics-theory and Methods, vol. 3, pp. 1-27, 1974. [DOI:10.1080/03610927408827101]

35. [35] W. S. Sarle, "Finding Groups in Data: An Introduction to Cluster Analysis," Journal of the American Statistical Association, vol. 86, pp. 830-833, 1991. [DOI:10.2307/2290430]

36. [36] M. Charrad, Y. Lechevallier, M. B. Ahmed, and G. Saporta, "On the Number of Clusters in Block Clustering Algorithms," in FLAIRS Conference, 2010.

37. [37] K. Bache and M. Lichman, "UCI machine lear-ning repository," 2013.

38. [38] A. L. Fred and A. K. Jain, "Data clustering using evidence accumulation," in Pattern Recogni-tion, 2002. Proceedings. 16th Inter-na-tional Conference on, 2002, pp. 276-280.

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote