Signal and Data Processing -

Search published articles

Showing 4 results for Concept Drift

Concept drift detection in business process logs using deep learning

Ms Fatemeh Khojasteh, Pro Mohsen Kahani, Dr Behashid Behkamal,
Volume 17, Issue 4 (2-2021)

Abstract

Process mining provides a bridge between process modeling and analysis on the one hand and data mining on the other hand. Process mining aims at discovering, monitoring, and improving real processes by extracting knowledge from event logs. However, as most business processes change over time (e.g. the effects of new legislation, seasonal effects and etc.), traditional process mining techniques cannot capture such “second-order dynamics” and analyze these processes as if they are in steady-state. Such changes can significantly impact the performance of processes. Hence, for the process management, it is crucial that changes in processes be discovered and analyzed. Process change detection is also known as business process drift detection.
All the existing methods for process drift detection are dependent on the size of windows used for detecting changes. Identifying convenient features that characterize the relations between traces or events is another challenge in most methods. In this thesis, we propose an automated and window-independent approach for detecting sudden business process drifts by introducing the notion of trace embedding. Using trace embedding makes it possible to automatically extract all features from the relations between traces. We show that the proposed approach outperforms all the existing methods in respect of its significantly higher accuracy and lower detection delay.

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Hossein Hasan Nezhad Namaghi, Hoda Mashayekhi, Morteza Zahedi,
Volume 18, Issue 4 (3-2022)

Abstract

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refers to changes in the statistical properties of data, and is divided into four categories: sudden, gradual, incremental, and recurring. Concept drift is generally dealt with by periodically updating the classifier, or employing an explicit change detector to determine the update time. These approaches are based on the assumption that the true labels are available for all data samples. Nevertheless, due to the cost of labeling instances, access to a partial labeling is more realistic. In a number of studies that have used semi-supervisory learning, the labels are received from the user to update the models in form of active learning. The purpose of this study is to classify samples in an unlimited data stream in presence of concept drift, using only a limited set of initial labeled data. To this end, a semi-supervised ensemble learning algorithm for data stream is proposed, which uses entropy variation to detect concept drift and is applicable for sudden and gradual drifts. The proposed model is trained with a limited initial labeled set. In occurrence of concept drift, the unlabeled data is used to update the ensemble model. It does not require receiving the labels from the user. In contrast to many of the current studies, the proposed algorithm uses an ensemble of K-NN classifiers. It constructs a group of clustering-based classification models, each of which is trained on a batch of data. On receiving each new sample, first it is determined whether the data sample is an outlier or not. If the data is included in a cluster, the sample class is determined by majority voting. When a window of the stream is received, the possibility of concept drift is examined based on entropy variation, and the classifier is updated by a semi-supervised approach if necessary. The model itself determines the required data labels. The proposed method is capable of detecting concept drift in data, and improving its accuracy via updating the learning model with appropriate samples received from the stream. Therefore, the proposed method only requires a small initial labeled data. Experiments are performed using five real and synthetic datasets, and the model performance is compared to three other approaches. The results show that the proposed method is superior in terms of precision, recall and F1 score compared to other studies.

Concept drift detection in event logs using statistical information of variants

Fershteh Javadzadeh, Mehdi Yaghoubi, Soheila Karbasi,
Volume 19, Issue 1 (5-2022)

Abstract

In recent years, business process management (BPM) has been highly regarded as an improvement in the efficiency and effectiveness of organizations. Extracting and analyzing information on business processes is an important part of this structure. But these processes are not sustainable over time and may change for a variety of reasons, such as the environment, human resources, capital market changes, seasonal, and climate changes. These changes in business processes are referred to as concept drift in event logs. The discovery of concept drifts is one of the challenges in business process management. These drifts may occur suddenly, gradually, periodically, or incrementally. This paper proposes an algorithm for identifying sudden concept drifts in event logs that are created by BPM. Each execution of the process instance follows a specific path in the process model called a trace, all traces that follow the same path in process model are called a variant. The proposed algorithm is based on the distribution of trace variants in the execution of processes. In this method, by moving two sliding windows on the event log, two feature vectors are derived from the two windows trace variants, these windows are named reference and detection windows. Then variants of the two windows are compared by applying statistical G-test and finally the drifts are identified. In statistics, G-test is likelihood-ratio or maximum likelihood statistical significance test. Experiments on artificial databases show the correctness of the method and its superiority to the previous methods. In the proposed method, the detection accuracy is 0.06% better than state-of-the-art methods on average

Online Learning for Imbalanced Data Streams with Concept Drift by Belief Theory and Chaotic Function

Dr. Javad Hamidzadeh, Mohammad Ali Rashidi Mahmoodi, Mona Moradi,
Volume 20, Issue 4 (3-2024)

Abstract

Continual learning from data streams is a pivotal aspect of machine learning, requiring the development of algorithms capable of adapting to incoming data. However, the ongoing evolution of data streams presents a formidable challenge as previously acquired knowledge may become outdated. This challenge, known as concept drift, demands timely detection for the effective adaptation of learning models. While various drift detectors have been proposed, they often assume a relatively balanced class distribution. In scenarios with imbalanced data streams, these detectors may exhibit bias toward majority classes, overlooking shifts in minority classes. Moreover, the imbalance among classes can change over time, with roles shifting between majority and minority classes, especially when relationships among classes become complex due to overlapping regions. In this paper, a novel classification method is introduced for imbalanced streaming data affected by concept drift. The proposed method continuously monitors arriving streams to detect and adapt to both imbalances and concept drift. Upon receiving a new block of data, the proposed method employs the k-means clustering approach to identify non-dense regions and performs oversampling for minority classes. Cluster centers are selected using the belief function to address overlapping issues between majority and minority classes. Utilizing a chaotic approach, the new sample is added based on its neighborhood and the size of thresholds that cover time intervals and classification errors. Finally, the label prediction process is done by ensemble learning and weighted majority voting. Experiments conducted on benchmark datasets from the UCI database evaluate the performance of the proposed method using Leave-One-Out (LOO) validation and comparisons with state-of-the-art methods. The results demonstrate the superiority of the proposed method across various evaluation criteria, highlighting its effectiveness in addressing imbalanced streaming data with concept drift.

Page 1 from 1

Signal and Data Processing

Search published articles

Vote