Outlier Detection on Data Streams Using a QLattice-based Model and Online Learning

Fardin, Sahar; Hashemzadeh, Mahdi

doi:10.61186/jsdp.20.2.81

Volume 20, Issue 2 (9-2023) JSDP 2023, 20(2): 81-98 | Back to browse issues page

‎ 10.61186/jsdp.20.2.81

Mendeley

Zotero

RefWorks

Fardin S, Hashemzadeh M. Outlier Detection on Data Streams Using a QLattice-based Model and Online Learning. JSDP 2023; 20 (2) : 6
URL: http://jsdp.rcisp.ac.ir/article-1-1226-en.html

Outlier Detection on Data Streams Using a QLattice-based Model and Online Learning

Sahar Fardin

, Mahdi Hashemzadeh ^*

Azarbaijan Shahid Madani University

Abstract: (1625 Views)

With the advancement of computer science, the dramatic developments in data mining area and their increasing applications, the identification of outlier or anomaly data has also become one of the most important research topics. In most applications, the outlier data contain beneficial information that can be used to gain useful knowledge. Today, there are a large number of applications on data streams, in the vast majority of which the discovery of outlier/anomaly data is very important and in some cases vital. Detection of anomalies is an important way for detecting frauds, network intrusion detection, detection of abnormal behaviors in monitoring systems, and other rare events that are always of great importance; but they are often difficult to identify. Most of the existing efficient outlier detection algorithms have been designed for the static data. While outlier detection is more challenging in data streams, where data are generating continuously and has especial properties such as infinity and transience. In this research, we introduce an approach based on the QLattice classification model, which works based on the quantum computing and performs better in the intended application than other classification methods. Given the possibility of changing the distribution of data over time in streaming data, a scheme to take advantage of online incremental learning is also applied in the proposed method. Considering the unlimited data flow and limited processing memory, the detection process is applied to a window of data that is constantly updated with data sampled from previous windows. A function is also designed to solve the problem of data imbalance, which uses the random sampling technique to solve this issue. The results of experiments obtained on benchmark datasets show that the proposed approach has better performance than other methods.

Article number: 6

Keywords: Outlier detection, Data streams, Online learning, Incremental learning, Data mining

Full-Text [PDF 1218 kb] (549 Downloads)

Type of Study: Research | Subject: Paper
Received: 2021/04/18 | Accepted: 2022/05/11 | Published: 2023/10/22 | ePublished: 2023/10/22

References

1. [1] Y. Djenouri, D. Djenouri, and J. C.-W. Lin, "Trajectory Outlier Detection: New Problems and Solutions for Smart Cities," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 15, no. 2, pp. 1-28, 2021. [DOI:10.1145/3425867]

2. [2] A. Belhadi, Y. Djenouri, G. Srivastava, D. Djenouri, A. Cano, and J. C.-W. Lin, "A Two-Phase Anomaly Detection Model for Secure Intelligent Transportation Ride-Hailing Trajectories," IEEE Transactions on Intelligent Transportation Systems, 2020. [DOI:10.1109/TITS.2020.3022612]

3. [3] M. Hashemzadeh and A. Zademehdi, "Fire detection for video surveillance applications using ICA K-medoids-based color model and efficient spatio-temporal visual features," Expert Systems with Applications, vol. 130, pp. 60-78, 2019. [DOI:10.1016/j.eswa.2019.04.019]

4. [4] M. Hashemzadeh, G. Pan, and M. Yao, "Counting moving people in crowds using motion statistics of feature-points," Multimedia tools and applications, vol. 72, no. 1, pp. 453-487, 2014. [DOI:10.1007/s11042-013-1367-2]

5. [5] M. Hashemzadeh, G. Pan, Y. Wang, M. Yao, and J. Wu, "Combining velocity and location-specific spatial clues in trajectories for counting crowded moving objects," International Journal of Pattern Recognition and Artificial Intelligence, vol. 27, no. 02, p. 1354003, 2013. [DOI:10.1142/S0218001413540037]

6. [6] M. Hashemzadeh and N. Farajzadeh, "Combining keypoint-based and segment-based features for counting people in crowded scenes," Information Sciences, vol. 345, pp. 199-216, 2016. [DOI:10.1016/j.ins.2016.01.060]

7. [7] N. Farajzadeh, A. Karamiani, and M. Hashemzadeh, "A fast and accurate moving object tracker in active camera model," Multimedia Tools and Applications, vol. 77, no. 6, pp. 6775-6797, 2018. [DOI:10.1007/s11042-017-4597-x]

8. [8] S. Sadik and L. Gruenwald, "Research issues in outlier detection for data streams," Acm Sigkdd Explorations Newsletter, vol. 15, no. 1, pp. 33-40, 2014. [DOI:10.1145/2594473.2594479]

9. [9] J. Han, M. Kamber, and J. Pei, "Data mining: concepts and techniques, Waltham, MA," Morgan Kaufman Publishers, vol. 10, pp. 978-1, 2012.

10. [10] D. M. Hawkins, Identification of outliers. Springer, 1980. [DOI:10.1007/978-94-015-3994-4]

11. [11] S. Mehta, "Concept drift in streaming data classification: Algorithms, Platforms and issues," Procedia computer science, vol. 122, pp. 804-811, 2017. [DOI:10.1016/j.procs.2017.11.440]

12. [12] V. Hodge and J. Austin, "A survey of outlier detection methodologies," Artificial intelligence review, vol. 22, no. 2, pp. 85-126, 2004. [DOI:10.1023/B:AIRE.0000045502.10941.a9]

13. [13] M. Singh and R. Pamula, "ADINOF: adaptive density summarizing incremental natural outlier detection in data stream," Neural Computing and Applications, pp. 1-17, 2021. [DOI:10.1007/s00521-021-05725-0]

14. [14] M. Gupta, J. Gao, C. C. Aggarwal, and J. Han, "Outlier detection for temporal data: A survey," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 9, pp. 2250-2267, 2013. [DOI:10.1109/TKDE.2013.184]

15. [15] Y. Yang, L. Chen, and C. Fan, "ELOF: fast and memory-efficient anomaly detection algorithm in data streams," Soft Computing, pp. 1-12, 2020. [DOI:10.1007/s00500-020-05442-1]

16. [16] L. Chen, W. Wang, and Y. Yang, "CELOF: Effective and fast memory efficient local outlier detection in high-dimensional data streams," Applied Soft Computing, vol. 102, p. 107079, 2021. [DOI:10.1016/j.asoc.2021.107079]

17. [17] S. Thudumu, P. Branch, J. Jin, and J. J. Singh, "A comprehensive survey of anomaly detection techniques for high dimensional big data," Journal of Big Data, vol. 7, no. 1, pp. 1-30, 2020. [DOI:10.1186/s40537-020-00320-x]

18. [18] M. V. Joshi, R. C. Agarwal, and V. Kumar, "Mining needle in a haystack: classifying rare classes via two-phase rule induction," in Proceedings of the 2001 ACM SIGMOD international conference on Management of data, 2001, pp. 91-102. [DOI:10.1145/375663.375673] [PMID]

19. [19] S. Hawkins, H. He, G. Williams, and R. Baxter, "Outlier detection using replicator neural networks," in International Conference on Data Warehousing and Knowledge Discovery, 2002: Springer, pp. 170-180. [DOI:10.1007/3-540-46145-0_17]

20. [20] M. U. Togbe et al., "Anomaly Detection for Data Streams Based on Isolation Forest Using Scikit-Multiflow," in International Conference on Computational Science and Its Applications, 2020: Springer, pp. 15-30. [DOI:10.1007/978-3-030-58811-3_2]

21. [21] G. Han, J. Tu, L. Liu, M. Martínez-García, and Y. Peng, "Anomaly Detection Based on Multidimensional Data Processing for Protecting Vital Devices in 6G-Enabled Massive IIoT," IEEE Internet of Things Journal, vol. 8, no. 7, pp. 5219-5229, 2021. [DOI:10.1109/JIOT.2021.3051935]

22. [22] N. M. R. SURI and G. Athithan, Outlier detection: techniques and applications. Springer, 2019.

23. [23] K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne, "On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms," Data Mining and Knowledge Discovery, vol. 8, no. 3, pp. 275-300, 2004. [DOI:10.1023/B:DAMI.0000023676.72185.7c]

24. [24] C. C. Aggarwal, S. Y. Philip, J. Han, and J. Wang, "A framework for clustering evolving data streams," in Proceedings 2003 VLDB conference, 2003: Elsevier, pp. 81-92. [DOI:10.1016/B978-012722442-8/50016-1] [PMID]

25. [25] I. Assent, P. Kranen, C. Baldauf, and T. Seidl, "Anyout: Anytime outlier detection on streaming data," in International Conference on Database Systems for Advanced Applications, 2012: Springer, pp. 228-242. [DOI:10.1007/978-3-642-29038-1_18]

26. [26] F. Angiulli and F. Fassetti, "Detecting distance-based outliers in streams of data," in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, 2007, pp. 811-820. [DOI:10.1145/1321440.1321552]

27. [27] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, "LOF: identifying density-based local outliers," in ACM sigmod record, 2000, vol. 29, no. 2: ACM, pp. 93-104. [DOI:10.1145/335191.335388]

28. [28] G. S. Na, D. Kim, and H. Yu, "DILOF: Effective and memory efficient local outlier detection in data streams," in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1993-2002.

29. [29] M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Zhang, "Fast memory efficient local outlier detection in data streams," IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3246-3260, 2016. [DOI:10.1109/TKDE.2016.2597833]

30. [30] J. Gao, W. Ji, L. Zhang, A. Li, Y. Wang, and Z. Zhang, "Cube-based incremental outlier detection for streaming computing," Information Sciences, vol. 517, pp. 361-376, 2020. [DOI:10.1016/j.ins.2019.12.060]

31. [31] X. Qin, L. Cao, E. A. Rundensteiner, and S. Madden, "Scalable Kernel Density Estimation-based Local Outlier Detection over Large Data Streams," in EDBT, 2019, pp. 421-432.

32. [32] F. T. Liu, K. M. Ting, and Z.-H. Zhou, "Isolation forest," in 2008 Eighth IEEE International Conference on Data Mining, 2008: IEEE, pp. 413-422. [DOI:10.1109/ICDM.2008.17]

33. [33] Z. Ding and M. Fei, "An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window," IFAC Proceedings Volumes, vol. 46, no. 20, pp. 12-17, 2013. [DOI:10.3182/20130902-3-CN-3020.00044]

34. [34] S. C. Tan, K. M. Ting, and T. F. Liu, "Fast anomaly detection for streaming data," in Twenty-Second International Joint Conference on Artificial Intelligence, 2011.

35. [35] S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, "Unsupervised real-time anomaly detection for streaming data," Neurocomputing, vol. 262, pp. 134-147, 2017. [DOI:10.1016/j.neucom.2017.04.070]

36. [36] A. Tuor, S. Kaplan, B. Hutchinson, N. Nichols, and S. Robinson, "Deep learning for unsupervised insider threat detection in structured cybersecurity data streams," arXiv preprint arXiv:1710.00811, 2017.

37. [37] B. V. Ashok, "QLattice Environment and Feyn QGraph Models - A new Perspective towards Deep Learning," Zenodo, 2020.

38. [38] M. Machado. "A new kind of AI." https://medium.com/abzuai/a-new-kind-of-ai-7665f8198877 (accessed.

39. [39] K. B. T. Jelen. https://docs.abzu.ai/docs/guides/qlattice.html (accessed.

40. [40] C. Cave. "Opening the black box." https://medium.com/abzuai/opening-the-black-box-247a63ce553e (accessed.

41. [41] J. Brownlee, "Why one-hot encode data in machine learning," Machine Learning Mastery, 2017.

42. [42] M. DelSole. "What is One Hot Encoding and How to Do It." https://medium.com/@michaeldelsole/what-is-one-hot-encoding-and-how-to-do-it-f0ae272f1179 (accessed.

43. [43] A. Fernández, S. Garcia, F. Herrera, and N. V. Chawla, "SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary," Journal of artificial intelligence research, vol. 61, pp. 863-905, 2018. [DOI:10.1613/jair.1.11192]

44. [44] P. Soltanzadeh and M. Hashemzadeh, "RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem," Information Sciences, vol. 542, pp. 92-111, 2021. [DOI:10.1016/j.ins.2020.07.014]

45. [45] "Overview of Online Machine Learning in Big Data Streams," in Encyclopedia of Big Data Technologies, S. Sakr and A. Y. Zomaya Eds. Cham: Springer International Publishing, 2019, pp. 1239-1239. [DOI:10.1007/978-3-319-77525-8_100249]

46. [46] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, "A detailed analysis of the KDD CUP 99 data set," in 2009 IEEE symposium on computational intelligence for security and defense applications, 2009: IEEE, pp. 1-6. [DOI:10.1109/CISDA.2009.5356528]

47. [47] "Strategies to scale computationally: bigger data." https://scikit-learn.org/0.15/modules/scaling_strategies.html (accessed 2018).

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote