Volume 20, Issue 4 (3-2024)                   JSDP 2024, 20(4): 23-34 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Hamidzadeh J, Rashidi Mahmoodi M A, Moradi M. Online Learning for Imbalanced Data Streams with Concept Drift by Belief Theory and Chaotic Function. JSDP 2024; 20 (4) : 2
URL: http://jsdp.rcisp.ac.ir/article-1-1246-en.html
Sadjad University
Abstract:   (1143 Views)
Continual learning from data streams is a pivotal aspect of machine learning, requiring the development of algorithms capable of adapting to incoming data. However, the ongoing evolution of data streams presents a formidable challenge as previously acquired knowledge may become outdated. This challenge, known as concept drift, demands timely detection for the effective adaptation of learning models. While various drift detectors have been proposed, they often assume a relatively balanced class distribution. In scenarios with imbalanced data streams, these detectors may exhibit bias toward majority classes, overlooking shifts in minority classes. Moreover, the imbalance among classes can change over time, with roles shifting between majority and minority classes, especially when relationships among classes become complex due to overlapping regions. In this paper, a novel classification method is introduced for imbalanced streaming data affected by concept drift. The proposed method continuously monitors arriving streams to detect and adapt to both imbalances and concept drift. Upon receiving a new block of data, the proposed method employs the k-means clustering approach to identify non-dense regions and performs oversampling for minority classes. Cluster centers are selected using the belief function to address overlapping issues between majority and minority classes. Utilizing a chaotic approach, the new sample is added based on its neighborhood and the size of thresholds that cover time intervals and classification errors. Finally, the label prediction process is done by ensemble learning and weighted majority voting. Experiments conducted on benchmark datasets from the UCI database evaluate the performance of the proposed method using Leave-One-Out (LOO) validation and comparisons with state-of-the-art methods. The results demonstrate the superiority of the proposed method across various evaluation criteria, highlighting its effectiveness in addressing imbalanced streaming data with concept drift.
Article number: 2
Full-Text [PDF 807 kb]   (335 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2021/07/1 | Accepted: 2023/07/5 | Published: 2024/04/25 | ePublished: 2024/04/25

References
1. [1] G. Douzas, R. Rauch, and F. Bacao, "G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE," Expert Systems with Applications, vol. 183, p. 115230, 2021, doi: https://doi.org/10.1016/j.eswa.2021.115230 [DOI:10.1016/j.eswa.2021.115230.]
2. [2] J. Engelmann and S. Lessmann, "Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning," Expert Systems with Applications, vol. 174, p. 114582, 2021, doi: https://doi.org/10.1016/j.eswa.2021.114582 [DOI:10.1016/j.eswa.2021.114582.]
3. [3] X. Xie, H. Liu, S. Zeng, L. Lin, and W. Li, "A novel progressively undersampling method based on the density peaks sequence for imbalanced data," Knowledge-Based Systems, vol. 213, p. 106689, 2021, doi: https://doi.org/10.1016/j.knosys.2020.106689 [DOI:10.1016/j.knosys.2020.106689.]
4. [4] G. Douzas, F. Bacao, and F. Last, "Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE," Information Sciences, vol. 465, pp. 1-20, 2018. [DOI:10.1016/j.ins.2018.06.056]
5. [5] Z. Xu, D. Shen, T. Nie, Y. Kou, N. Yin, and X. Han, "A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data," Information Sciences, vol. 572, pp. 574-589, 2021, doi: https://doi.org/10.1016/j.ins.2021.02.056 [DOI:10.1016/j.ins.2021.02.056.]
6. [6] Z. Li, W. Huang, Y. Xiong, S. Ren, and T. Zhu, "Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm," Knowledge-Based Systems, vol. 195, p. 105694, 2020, doi: 10.1016/j.knosys.2020.105694. [DOI:10.1016/j.knosys.2020.105694]
7. [7] E. S. Page, "Continuous inspection schemes," Biometrika, vol. 41, no. 1/2, pp. 100-115, 1954. [DOI:10.1093/biomet/41.1-2.100]
8. [8] D. Siegmund, Sequential analysis: tests and confidence intervals. Springer Science & Business Media, 2013.
9. [9] O. A. Mahdi, E. Pardede, and N. Ali, "KAPPA as Drift Detector in Data Stream Mining," Procedia Computer Science, vol. 184, pp. 314-321, 2021. [DOI:10.1016/j.procs.2021.03.040]
10. [10] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with Drift Detection," Berlin, Heidelberg, 2004: Springer Berlin Heidelberg, in Advances in Artificial Intelligence - SBIA 2004, pp. 286-295. [DOI:10.1007/978-3-540-28645-5_29]
11. [11] M. Baena-Garcıa, J. del Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavalda, and R. Morales-Bueno, "Early drift detection method," in Fourth international workshop on knowledge discovery from data streams, 2006, vol. 6, pp. 77-86.
12. [12] A. Bifet and R. Gavalda, "Learning from time-changing data with adaptive windowing," in Proceedings of the 2007 SIAM international conference on data mining, 2007: SIAM, pp. 443-448. [DOI:10.1137/1.9781611972771.42]
13. [13] P. Domingos and G. Hulten, "Mining high-speed data streams," in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000: ACM, pp. 71-80. [DOI:10.1145/347090.347107]
14. [14] G. Liu, H. Cheng, Z. Qin, Q. Liu, and C. Liu, "E-CVFDT: An improving CVFDT method for concept drift data stream," in 2013 International Conference on Communications, Circuits and Systems (ICCCAS), 2013, vol. 1, pp. 315-318, doi: 10.1109/ICCCAS.2013.6765241. [DOI:10.1109/ICCCAS.2013.6765241]
15. [15] S. A. Jadhav and S. Kosbatwar, "Concept-adapting Very Fast Decision Tree with Misclassification Error," 2016.
16. [16] I. Frias-Blanco, J. del Campo-Ávila, G. Ramos-Jimenez, R. Morales-Bueno, A. Ortiz-Diaz, and Y. Caballero-Mota, "Online and non-parametric drift detection methods based on Hoeffding's bounds," IEEE Transactions on Knowledge Data Engineering, vol. 27, no. 3, pp. 810-823, 2014. [DOI:10.1109/TKDE.2014.2345382]
17. [17] A. Pesaranghader and H. L. Viktor, "Fast hoeffding drift detection method for evolving data streams," in Joint European conference on machine learning and knowledge discovery in databases, 2016: Springer, pp. 96-111. [DOI:10.1007/978-3-319-46227-1_7]
18. [18] A. Pesaranghader, H. Viktor, and E. Paquet, "Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams," Machine Learning, vol. 107, no. 11, pp. 1711-1743, 2018. [DOI:10.1007/s10994-018-5719-z]
19. [19] Y. Yuan, Z. Wang, and W. Wang, "Unsupervised concept drift detection based on multi-scale slide windows," Ad Hoc Networks, vol. 111, p. 102325, 2021. [DOI:10.1016/j.adhoc.2020.102325]
20. [20] A. Feitosa Neto and A. M. P. Canuto, "EOCD: An ensemble optimization approach for concept drift applications," Information Sciences, vol. 561, pp. 81-100, 2021, doi: https://doi.org/10.1016/j.ins.2021.01.051 [DOI:10.1016/j.ins.2021.01.051.]
21. [21] D. H. Jeong and J. M. Lee, "Ensemble learning based latent variable model predictive control for batch trajectory tracking under concept drift," Computers & Chemical Engineering, vol. 139, p. 106875, 2020, doi: https://doi.org/10.1016/j.compchemeng.2020.106875 [DOI:10.1016/j.compchemeng.2020.106875.]
22. [22] W. Liu, H. Zhang, Z. Ding, Q. Liu, and C. Zhu, "A comprehensive active learning method for multiclass imbalanced data streams with concept drift," Knowledge-Based Systems, vol. 215, p. 106778, 2021, doi: https://doi.org/10.1016/j.knosys.2021.106778 [DOI:10.1016/j.knosys.2021.106778.]
23. [23] P. Zyblewski, R. Sabourin, and M. Woźniak, "Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams," Information Fusion, vol. 66, pp. 138-154, 2021, doi: 10.1016/j.inffus.2020.09.004. [DOI:10.1016/j.inffus.2020.09.004]
24. [24] J. Hamidzadeh and M. Moradi, "Improving Chernoff criterion for classification by using the filled function," jsdp, vol. 19, no. 3, pp. 105-118, 2022, doi: 10.52547/jsdp.19.3.105. [DOI:10.52547/jsdp.19.3.105]
25. [25] J. Pouramini, B. Minaei-Bidgoli, and M. Esmaeili, "A Novel One Sided Feature Selection Method for Imbalanced Text Classification," jsdp, vol. 16, no. 1, pp. 21-40, 2019, doi: 10.29252/jsdp.16.1.21. [DOI:10.29252/jsdp.16.1.21]
26. [26] E. Yasrebi Naeini and m. hatami, "Improving Imbalanced Data Classification Accuracy by using Fuzzy Similarity Measure and Subtractive Clustering," jsdp, vol. 19, no. 2, pp. 27-38, 2022, doi: 10.52547/jsdp.19.2.27. [DOI:10.52547/jsdp.19.2.27]
27. [27] Y. Wang, Y. Zhang, and Y. Wang, "Mining Data Streams with Skewed Distribution by Static Classifier Ensemble," in Opportunities and Challenges for Next-Generation Applied Intelligence, B.-C. Chien and T.-P. Hong Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 65-71. [DOI:10.1007/978-3-540-92814-0_11]
28. [28] S. Chen and H. He, "Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach," Evolving Systems, vol. 2, no. 1, pp. 35-50, 2011, doi: 10.1007/s12530-010-9021-y. [DOI:10.1007/s12530-010-9021-y]
29. [29] R. N. Lichtenwalter and N. V. Chawla, "Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams," in New Frontiers in Applied Data Mining, Berlin, Heidelberg, T. Theeramunkong et al., Eds., 2010// 2010: Springer Berlin Heidelberg, pp. 53-75. [DOI:10.1007/978-3-642-14640-4_5]
30. [30] G. Ditzler and R. Polikar, "Incremental Learning of Concept Drift from Streaming Imbalanced Data," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 10, pp. 2283-2301, 2013, doi: 10.1109/TKDE.2012.136. [DOI:10.1109/TKDE.2012.136]
31. [31] R. R. Yager and L. Liu, Classic works of the Dempster-Shafer theory of belief functions. Springer, 2008. [DOI:10.1007/978-3-540-44792-4]
32. [32] M. A. A. Abdualrhman and M. Padma, "CD2A: Concept Drift Detection Approach Toward Imbalanced Data Stream," in Emerging Research in Electronics, Computer Science and Technology: Springer, 2019, pp. 597-612. [DOI:10.1007/978-981-13-5802-9_54]
33. [33] M. M. W. Yan, "Accurate detecting concept drift in evolving data streams," ICT Express, 2020.

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing