بهبود رده‌بندی داده‌های نامتوازن با استفاده از معیارهای شباهت فازی و خوشه‌بندی کاهشی

یثربی نائینی, احسان; حاتمی, مهلا

doi:10.52547/jsdp.19.2.27

دوره 19، شماره 2 - ( 7-1401 ) جلد 19 شماره 2 صفحات 38-27 | برگشت به فهرست نسخه ها

‎ 10.52547/jsdp.19.2.27

Mendeley

Zotero

RefWorks

Yasrebi Naeini E, hatami M. Improving Imbalanced Data Classification Accuracy by using Fuzzy Similarity Measure and Subtractive Clustering. JSDP 2022; 19 (2) : 3
URL: http://jsdp.rcisp.ac.ir/article-1-1010-fa.html

یثربی نائینی احسان، حاتمی مهلا. بهبود رده‌بندی داده‌های نامتوازن با استفاده از معیارهای شباهت فازی و خوشه‌بندی کاهشی. پردازش علائم و داده‌ها. 1401; 19 (2) :27-38

URL: http://jsdp.rcisp.ac.ir/article-1-1010-fa.html

بهبود رده‌بندی داده‌های نامتوازن با استفاده از معیارهای شباهت فازی و خوشه‌بندی کاهشی

احسان یثربی نائینی^*

، مهلا حاتمی

گروه برق و کامپیوتر، دانشگاه تربت حیدریه

چکیده: (2029 مشاهده)

یکی از قسمتهای مهم در دادهکاوی و کشف دانش از پایگاه داده، ردهبندی است. در اغلب موارد دادههایی که برای آموزش رده‌بندها به کار میروند از توزیع مناسبی برخوردار نیستند. این توزیع نامناسب هنگامی رخ میدهد که یک رده تعداد نمونههای زیادی دارد؛ درحالی‌که به‌طور ذاتی نمونه‌های رده دیگر کم است. به‌طورکلی روش‌های حل این نوع مسائل به دو دسته نمونه‌گیری کاهشی و نمونه‌گیری افزایشی تقسیم می‌شود. در این مقاله یک روش نمونه‌گیری کاهشی با استفاده از ترکیب خوشه‌بندی و معیارهای شباهت فازی ارائه ‌شده است و عملکرد آن‌ها ازنظر کارآمدی در رده‌بندی داده‌های نامتوازن مورد تحلیل و بررسی قرارگرفته‌اند. بدین منظور در ابتدا خوشه‌بندی کاهشی انجام‌ شده و داده‌های رده اکثریت خوشه‌بندی، سپس با استفاده از معیارهای شباهت فازی نمونه‌های هر خوشه رتبه‌بندی و بر اساس این رتبه‌ها نمونه‌های مناسب انتخاب می‌شود؛ نمونه‌های انتخاب‌شده به همراه رده اقلیت مجموعه داده نهایی را تشکیل می‌دهند. در این پژوهش پیاده‌سازی در نرم‌افزار MATLAB، ارزیابی نتایج از طریق محاسبه معیار AUC و تحلیل نتایج با استفاده از آزمون‌های آماری استاندارد انجام‌ شده است. نتایج مطالعه نشان‌دهنده عملکرد بهتر روش پیشنهادی، نسبت به سایر روش‌های شناخته ‌شده است.

شماره‌ی مقاله: 3

واژه‌های کلیدی: رده‌بندی داده‌های نامتوازن، معیارهای شباهت فازی، نمونه‌گیری، خوشه‌بندی کاهشی

متن کامل [PDF 1059 kb] (879 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: مقالات پردازش داده‌های رقمی
دریافت: 1398/9/11 | پذیرش: 1399/5/28 | انتشار: 1401/7/8 | انتشار الکترونیک: 1401/7/8

فهرست منابع

1. [1] A. Braun, and et al, "Landslide Susceptibility Mapping in Tegucigalpa, Honduras, Using Data Mining Methods", in IAEG/AEG Annual Meeting Proceedings, San Francisco, California, 2018-Volume 1. 2019. Springer. [DOI:10.1007/978-3-319-93124-1_25]

2. [2] S.Fotouhi, S. Asadi, and M.W. Kattan, "A comprehensive data level analysis for cancer diagnosis on imbalanced data", Journal of biomedical informatics, 2019. [DOI:10.1016/j.jbi.2018.12.003] [PMID]

3. [3] N. Junsomboon, and T. Phienthrakul, "Combining over-sampling and under-sampling techniques for imbalance dataset", in Proceedings of the 9th International Conference on Machine Learning and Computing. 2017. ACM. [DOI:10.1145/3055635.3056643]

4. [4] S.A. Golder, B.A. Huberman, "Usage patterns of collaborative tagging systems", Journal of information science, vol. 32(2), pp. 198-208. 2006. [DOI:10.1177/0165551506062337]

5. [5] Y. Sun, and et al., "Cost-sensitive boosting for classification of imbalanced data", Pattern Recognition, vol. 40(12), pp. 3358-3378, 2007. [DOI:10.1016/j.patcog.2007.04.009]

6. [6] Z.-H. Zhou, X.-Y. Liu, "Training cost-sensitive neural networks with methods addressing the class imbalance problem", IEEE Transactions on Knowledge & Data Engineering, pp. 63-77. 2006. [DOI:10.1109/TKDE.2006.17]

7. [7] N.V. Chawla, and et al., "SMOTE: synthetic minority over-sampling technique", Journal of artificial intelligence research, vol. 16, pp. 321-357. 2002. [DOI:10.1613/jair.953]

8. [8] E. Fernandes, and et al., "Ensemble of Classifiers based on MultiObjective Genetic Sampling for Imbalanced Data", IEEE Transactions on Knowledge and Data Engineering, 2019. [DOI:10.1109/TKDE.2019.2898861]

9. [9] A. Roy, et al. "A study on combining dynamic selection and data preprocessing for imbalance learning", Neurocomputing, pp. 179-192, 2002. [DOI:10.1016/j.neucom.2018.01.060]

10. [10] W. Xie, G.Liang, Z. Dong, B. Tan, and B. Zhang, "Mathematical Problems in Engineering; An Improved Oversampling Algorithm Based on the Samples", Selection Strategy for Classifying Imbalanced Data. 2019. [DOI:10.1155/2019/3526539]

11. [11] V.C. Silvia Cateni, M. Vannucci, "A method for resampling imbalanced datasets in binary classification tasks for real-world problems", Neurocomputing,Elsevier.

12. [12] T. M. Khoshgoftaar, A.F., D. J. Dittman and A. Napolitano, "Ensemble vs. Data Sampling: Which Option Is Best Suited to Improve Classification Performance of Imbalanced Bioinformatics Data?" 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), Vietri sul Mare, , 2015, pp. 705-712. [DOI:10.1109/ICTAI.2015.106]

13. [13] G.E. Batista, R.C. Prati, and M.C. Monard, "A study of the behavior of several methods for balancing machine learning training data", ACM SIGKDD explorations newsletter, vol. 6(1), pp. 20-29, 2004. [DOI:10.1145/1007730.1007735]

14. [14] P. Hart, "The condensed nearest neighbor rule (Corresp.)", IEEE transactions on information theory, vol. 14(3), pp. 515-516, 1968. [DOI:10.1109/TIT.1968.1054155]

15. [15] I.Tomek, "Two modifications of CNN", IEEE Trans. Systems, Man and Cybernetics, vol.6, pp. 769-772, 1976. [DOI:10.1109/TSMC.1976.4309452]

16. [16] J. Laurikkala, "Improving identification of difficult small classes by balancing class distribution", in Conference on Artificial Intelligence in Medicine in Europe, Springer. 2001. [DOI:10.1007/3-540-48229-6_9]

17. [17] S.-J.Yen, and Y.-S. Lee, "Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset", in Intelligent Control and Automation, Springer. pp. 731-740, 2006. [DOI:10.1007/978-3-540-37256-1_89]

18. [18] M. Kubat, and S. Matwin, "Addressing the curse of imbalanced training sets: one-sided selection", in Icml. 1997. Nashville, USA.

19. [19] S. Gazzah , A.H., N. Essoukri Ben Amara, "A hybrid sampling method for imbalanced data", pp. 1-6, 2015. [DOI:10.1109/SSD.2015.7348093]

20. [20] H. Han, W.-Y. Wang, and B.-H. Mao, "Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning", in International conference on intelligent computing, 2005, Springer. [DOI:10.1007/11538059_91]

21. [21] H. He, et al, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning", in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008.

22. [22] G. Cohen, et al., "Learning from imbalanced data in surveillance of nosocomial infection", Artificial intelligence in medicine, vol. 37(1), pp. 7-18, 2006. [DOI:10.1016/j.artmed.2005.03.002] [PMID]

23. [23] S. Tang, and S.-p. Chen, "The generation mechanism of synthetic minority class examples", in 2008 International Conference on Information Technology and Applications in Biomedicine, IEEE, 2008,. [DOI:10.1109/ITAB.2008.4570642] [PMID]

24. [24] J. Stefanowski, and S. Wilk, "Selective pre-processing of imbalanced data for improving classification performance", in International Conference on Data Warehousing and Knowledge Discovery, Springer, 2008.

25. [25] D.M.B. Tarigan, and D.P. Rini, "Particle Swarm Optimization-Based on Decision Tree of C4. 5 Algorithm for Upper Respiratory Tract Infections (URTI) Prediction", in Journal of Physics: Conference Series, IOP Publishing, 2019. [DOI:10.1088/1742-6596/1196/1/012077]

26. [26] D. Devi, and B. Purkayastha, "Redundancy-driven modified Tomek-link based undersampling: a solution to class imbalance", Pattern Recognition Letters, vol.93, pp. 3-12, 2017. [DOI:10.1016/j.patrec.2016.10.006]

27. [27] K. Javed, R. Gouriveau, and N. Zerhouni, "A new multivariate approach for prognostics based on extreme learning machine and fuzzy clustering", IEEE transactions on cybernetics, vol.45(12), pp. 2626-2639, 2015. [DOI:10.1109/TCYB.2014.2378056] [PMID]

28. [28] X.L. Xie, and G. Beni, "A validity measure for fuzzy clustering", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.(8), pp. 841-847. 1991 [DOI:10.1109/34.85677]

29. [29] K.Bataineh, M. Naji, and M. Saqer, "A comparison study between various fuzzy clustering algorithms", Editorial Board, vol. 5, pp. 335, 2011.

30. [30] Y. Ding, and X. Fu, "Kernel-based fuzzy c-means clustering algorithm based on genetic algorithm", Neurocomputing, vol.188, pp. 233-238, 2016. [DOI:10.1016/j.neucom.2015.01.106]

31. [31] R.R.Yager, and D.P. Filev, "Generation of fuzzy rules by mountain clustering", Journal of Intelligent & Fuzzy Systems, vol. 2(3), pp. 209-219. 1994. [DOI:10.3233/IFS-1994-2301]

32. [32] S.L. Chiu, "Fuzzy model identification based on cluster estimation", Journal of Intelligent & Fuzzy Systems,vol. 2(3), pp. 267-278. 1994. [DOI:10.3233/IFS-1994-2306]

33. [33] D. W.Kim, et al., "A kernel-based subtractive clustering method", Pattern Recognition Letters, vol. 26(7), pp. 879-891, 2005. [DOI:10.1016/j.patrec.2004.10.001]

34. [34] M. Y Chen, "A hybrid ANFIS model for business failure prediction utilizing particle swarm optimization and subtractive clustering", Information Sciences, vol.220, pp. 180-195. 2013. [DOI:10.1016/j.ins.2011.09.013]

35. [35] S. Zeng, S. M. Chen,. M. O.Teng, "Fuzzy forecasting based on linear combinations of independent variables, subtractive clustering algorithm and artificial bee colony algorithm", Information Sciences, vol.484, pp.350-366, 2019. [DOI:10.1016/j.ins.2019.01.071]

36. [36] I. Beg, and S. Ashraf, "Similarity measures for fuzzy sets", Appl. and Comput. Math, vol.8(2), pp. 192-202, 2009.

37. [37] L.T. Kóczy, and D. Tikk, "Fuzzy rendszerek", TypoTEX, Budapest, 2000.

38. [38] J. Williams, and N. Steele, "Difference, distance and similarity as a basis for fuzzy decision support based on prototypical decision classes", Fuzzy sets and systems, vol.131(1), pp. 35-46. 2002. [DOI:10.1016/S0165-0114(01)00253-6]

39. [39] S. Santini, and R. Jain, "Similarity is a geometer", Multimedia Tools and Applications, vol. 5(3), pp. 277-306, 1997. [DOI:10.1023/A:1009651725256]

40. [40] R. Zwick, E. Carlstein, and D.V. Budescu, "Measures of similarity among fuzzy concepts: A comparative analysis", International Journal of Approximate Reasoning, vol. 1(2), pp. 221-242,1987. [DOI:10.1016/0888-613X(87)90015-6]

41. [41] S. García, et al., "A study on the use of non-parametric tests for analyzing the evolutionary algorithms' behaviour: a case study on the CEC'2005 special session on real parameter optimization", Journal of Heuristics, vol.15(6), pp. 617-644, 2009. [DOI:10.1007/s10732-008-9080-4]

42. [42] O.T. Yıldız, Ö. Aslan, and E. Alpaydın, "Multivariate statistical tests for comparing classification algorithms," in Learning and Intelligent Optimization, Springer, pp. 1-15, 2011. [DOI:10.1007/978-3-642-25566-3_1]

43. [43] D.J. Sheskin, Handbook of parametric and nonparametric statistical procedures. 2003: Chapman and Hall/CRC. [DOI:10.1201/9781420036268] [PMID]

44. [44] S.García, and et al., "Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power", Information Sciences, vol.180(10), pp. 2044-2064, 2010. [DOI:10.1016/j.ins.2009.12.010]

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.

نظر شما در مورد قالب جدید چیست؟
	خوب
	متوسط
	ضعیف

پایگاه‌های مرتبط

واژگان کلیدی

نظرسنجی