Volume 16, Issue 2 (9-2019)                   JSDP 2019, 16(2): 105-120 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Pahlevanzadeh A, Niknafs A. Improvement of density-based clustering algorithm using modifying the density definitions and input parameter. JSDP 2019; 16 (2) :105-120
URL: http://jsdp.rcisp.ac.ir/article-1-760-en.html
Abstract:   (3745 Views)
Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically identify the number of clusters. There are advantages and disadvantages in this algorithm. It is difficult to determine the input parameters of this algorithm by the user. Also, this algorithm is unable to detect clusters with different densities in the data set. ISB-DBSCAN algorithm is another example of density-based algorithms that eliminates the disadvantages of the DBSCAN algorithm. ISB-DBSCAN algorithm reduces the input parameters of DBSCAN algorithm and uses an input parameter k as the nearest neighbor's number. This method is also able to identify different density clusters, but according to the definition of the new core point, It is not able to identify some clusters in a different data set.
This paper presents a method for improving ISB-DBSCAN algorithm. A proposed approach, such as ISB-DBSCAN, uses an input parameter k as the number of nearest neighbors and provides a new definition for core point. This method performs clustering in three steps, with the difference that, unlike ISB-DBSCAN algorithm, it can create a new cluster in the final stage. In the proposed method, a new criterion, such as the number of dataset dimensions used to detect noise in the used data set. Since the determination of the k parameter in the proposed method may be difficult for the user, a new method with genetic algorithm is also proposed for the automatic estimation of the k parameter. To evaluate the proposed methods, tests were carried out on 11 standard data sets and the accuracy of clustering in the methods was evaluated. The results showe that the proposed method is able to achieve better results in different data sets compare to other available methods. In the proposed method, the automatic determination of k parameter also obtained acceptable results.
Full-Text [PDF 3931 kb]   (1986 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2017/08/10 | Accepted: 2019/06/19 | Published: 2019/09/17 | ePublished: 2019/09/17

References
1. [1] C. Cassisi, A. Ferro, R. Giugno, G. Pigola, and A. Pulvirenti, "Enhancing density-based clustering: Parameter reduction and outlier detection," Information Systems, vol. 38, no. 3, pp. 317-330, 2013. [DOI:10.1016/j.is.2012.09.001]
2. [2] X. Chen, W. Liu, H. Qiu, and J. Lai, "APSCAN: A parameter free algorithm for clustering," Pattern Recognition Letters, vol. 32, no. 7, pp. 973-986, 2011. [DOI:10.1016/j.patrec.2011.02.001]
3. [3] H. Durnová, "Otakar Boruvka (1899-1995) and the Minimum Spanning Tree," 1998.
4. [4] M. T. Elbatta and W. M. Ashour, " A dynamic method for discovering density varied clusters," International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 6, no. 1, pp. 123-134, 2013.
5. [5] J. Esmaelnejad, J. Habibi, and S. H. Yeganeh, "A novel method to find appropriate ε for DBSCAN," in Asian Conference on Intelligent Information and Database Systems, Springer, 2010, pp. 93-102. [DOI:10.1007/978-3-642-12145-6_10]
6. [6] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in Kdd, 1996, vol. 96, no. 34, pp. 226-231.
7. [7] R. L. Graham and P. Hell, "On the history of the minimum spanning tree problem," Annals of the History of Computing, vol. 7, no. 1, pp. 43-57, 1985. [DOI:10.1109/MAHC.1985.10011]
8. [8]M. N. Gaonkar and K. Sawant, "AutoEpsDBSCAN: DBSCAN with Eps automatic for large dataset," International Journal on Advanced Computer Theory and Engineering, vol. 2, no. 2, pp. 11-16, 2013.
9. [9] J. Hou, H. Gao, and X. Li, "Dsets-dbscan: a parameter-free clustering algorithm," IEEE Transactions on Image Processing, vol. 25, no. 7, pp. 3182-3193, 2016. [DOI:10.1109/TIP.2016.2559803] [PMID]
10. [10] J. A. Hartigan and M. A. Wong, "Algorithm AS 136: A k-means clustering algorithm," Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1, pp. 100-108, 1979. [DOI:10.2307/2346830]
11. [11] A. Karami and R. Johansson, "Choosing dbscan parameters automatically using differential evolution," International Journal of Computer Applications, vol. 91, no. 7, 2014. [DOI:10.5120/15890-5059]
12. [12] Y. Lv et al., "An efficient and scalable density-based clustering algorithm for datasets with complex structures," Neurocomputing, vol. 171, pp. 9-22, 2016. [DOI:10.1016/j.neucom.2015.05.109]
13. [13] M. Mitchell, An introduction to genetic algo-rithms. MIT press, 1998.
14. [14] S. Pourmohammadi, A. Maleki, "A Fuzzy C-means Clustering Approach for Continuous Stress Detection during Driving", JSDP; 14 (4) :129-142,2018. [DOI:10.29252/jsdp.14.4.129]
15. [15] S. Mitra and J. Nandy, "Kddclus: A simple method for multi-density clustering," SKAD'11-Soft Computing Applications and Knowledge Discovery, pp. 72, 2011.
16. [16] G. H. Shah, "An improved DBSCAN, a density based clustering algorithm with parameter selection for high dimensional data sets," IEEE Engineering (NUiCONE), 2012 Nirma Univer-sity International Conference on, 2012, pp. 1-6. [DOI:10.1109/NUICONE.2012.6493211]
17. [17] K. Sawant, "Adaptive methods for determining dbscan parameters," International Journal of Innovative Science, Engineering & Technology, vol. 1, no. 4, 2014.
18. [18] P. Sharma and Y. Rathi, "Efficient Density-Based Clustering Using Automatic Parameter Detection," in Proceedings of the International Congress on Information and Communication Technology, Springer, 2016, pp. 433-441. [DOI:10.1007/978-981-10-0767-5_46]
19. [19] https://cs.joensuu.fi/sipu/datasets/
20. [20]http://archive.ics.uci.edu/ml/datasets/statlog+(heart)

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing