Processes supported by process-aware information systems are subject to continuous and often subtle changes due to evolving operational,organizational,or regulatory factors.These changes,referred to as incremental con...Processes supported by process-aware information systems are subject to continuous and often subtle changes due to evolving operational,organizational,or regulatory factors.These changes,referred to as incremental concept drift,gradually alter the behavior or structure of processes,making their detection and localization a challenging task.Traditional process mining techniques frequently assume process stationarity and are limited in their ability to detect such drift,particularly from a control-flow perspective.The objective of this research is to develop an interpretable and robust framework capable of detecting and localizing incremental concept drift in event logs,with a specific emphasis on the structural evolution of control-flow semantics in processes.We propose DriftXMiner,a control-flow-aware hybrid framework that combines statistical,machine learning,and process model analysis techniques.The approach comprises three key components:(1)Cumulative Drift Scanner that tracks directional statistical deviations to detect early drift signals;(2)a Temporal Clustering and Drift-Aware Forest Ensemble(DAFE)to capture distributional and classification-level changes in process behavior;and(3)Petri net-based process model reconstruction,which enables the precise localization of structural drift using transition deviation metrics and replay fitness scores.Experimental validation on the BPI Challenge 2017 event log demonstrates that DriftXMiner effectively identifies and localizes gradual and incremental process drift over time.The framework achieves a detection accuracy of 92.5%,a localization precision of 90.3%,and an F1-score of 0.91,outperforming competitive baselines such as CUSUM+Histograms and ADWIN+Alpha Miner.Visual analyses further confirm that identified drift points align with transitions in control-flow models and behavioral cluster structures.DriftXMiner offers a novel and interpretable solution for incremental concept drift detection and localization in dynamic,process-aware systems.By integrating statistical signal accumulation,temporal behavior profiling,and structural process mining,the framework enables finegrained drift explanation and supports adaptive process intelligence in evolving environments.Its modular architecture supports extension to streaming data and real-time monitoring contexts.展开更多
Purpose:This research addresses the challenge of concept drift in AI-enabled software,particularly within autonomous vehicle systems where concept drift in object recognition(like pedestrian detection)can lead to misc...Purpose:This research addresses the challenge of concept drift in AI-enabled software,particularly within autonomous vehicle systems where concept drift in object recognition(like pedestrian detection)can lead to misclassifications and safety risks.This study introduces a proactive framework to detect early signs of domain-specific concept drift by leveraging domain analysis and natural language processing techniques.This method is designed to help maintain the relevance of domain knowledge and prevent potential failures in AI systems due to evolving concept definitions.Design/methodology/approach:The proposed framework integrates natural language processing and image analysis to continuously update and monitor key domain concepts against evolving external data sources,such as social media and news.By identifying terms and features closely associated with core concepts,the system anticipates and flags significant changes.This was tested in the automotive domain on the pedestrian concept,where the framework was evaluated for its capacity to detect shifts in the recognition of pedestrians,particularly during events like Halloween and specific car accidents.Findings:The framework demonstrated an ability to detect shifts in the domain concept of pedestrians,as evidenced by contextual changes around major events.While it successfully identified pedestrian-related drift,the system’s accuracy varied when overlapping with larger social events.The results indicate the model’s potential to foresee relevant shifts before they impact autonomous systems,although further refinement is needed to handle high-impact concurrent events.Research limitations:This study focused on detecting concept drift in the pedestrian domain within autonomous vehicles,with results varying across domains.To assess generalizability,we tested the framework for airplane-related incidents and demonstrated adaptability.However,unpredictable events and data biases from social media and news may obscure domain-specific drifts.Further evaluation across diverse applications is needed to enhance robustness in evolving AI environments.Practical implications:The proactive detection of concept drift has significant implications for AI-driven domains,especially in safety-critical applications like autonomous driving.By identifying early signs of drift,this framework provides actionable insights for AI system updates,potentially reducing misclassification risks and enhancing public safety.Moreover,it enables timely interventions,reducing costly and labor-intensive retraining requirements by focusing only on the relevant aspects of evolving concepts.This method offers a streamlined approach for maintaining AI system performance in environments where domain knowledge rapidly changes.Originality/value:This study contributes a novel domain-agnostic framework that combines natural language processing with image analysis to predict concept drift early.This unique approach,which is focused on real-time data sources,offers an effective and scalable solution for addressing the evolving nature of domain-specific concepts in AI applications.展开更多
With the gradual penetration of the internet of things(IoT)into all areas of life,the scale of IoT devices shows an explosive growth trend.The era of internet of everything is coming,and the important position of IoT ...With the gradual penetration of the internet of things(IoT)into all areas of life,the scale of IoT devices shows an explosive growth trend.The era of internet of everything is coming,and the important position of IoT security is becoming increasingly prominent.Due to the large number types of IoT devices,there may be different security vulnerabilities,and unknown attack forms and virus samples are appear.In other words,large number of IoT devices,large data volumes,and various attack forms pose a big challenge of malicious traffic identification.To solve these problems,this paper proposes a concept drift detection and adaptation(CDDA)method for IoT security framework.The AI model performance is evaluated by verifying the effectiveness of IoT traffic for data drift detection,so as to select the best AI model.The experimental test are given to confirm that the feasibility of the framework and the adaptive method in practice,and the effect on the performance of IoT traffic identification is also verified.展开更多
Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directl...Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively.展开更多
Concept drift is a main security issue that has to be resolved since it presents a significant barrier to the deployment of machine learning(ML)models.Due to attackers’(and/or benign equivalents’)dynamic behavior ch...Concept drift is a main security issue that has to be resolved since it presents a significant barrier to the deployment of machine learning(ML)models.Due to attackers’(and/or benign equivalents’)dynamic behavior changes,testing data distribution frequently diverges from original training data over time,resulting in substantial model failures.Due to their dispersed and dynamic nature,distributed denial-of-service attacks pose a danger to cybersecurity,resulting in attacks with serious consequences for users and businesses.This paper proposes a novel design for concept drift analysis and detection of malware attacks like Distributed Denial of Service(DDOS)in the network.The goal of this architecture combination is to accurately represent data and create an effective cyber security prediction agent.The intrusion detection system and concept drift of the network has been analyzed using secure adaptive windowing with website data authentication protocol(SAW_WDA).The network has been analyzed by authentication protocol to avoid malware attacks.The data of network users will be collected and classified using multilayer perceptron gradient decision tree(MLPGDT)classifiers.Based on the classification output,the decision for the detection of attackers and authorized users will be identified.The experimental results show output based on intrusion detection and concept drift analysis systems in terms of throughput,end-end delay,network security,network concept drift,and results based on classification with regard to accuracy,memory,and precision and F-1 score.展开更多
Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes over...Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes overtime, leading to class imbalance and concept drift issues. Both these issuescause model performance degradation. Most of the current work has beenfocused on developing an ensemble strategy by training a new classifier on thelatest data to resolve the issue. These techniques suffer while training the newclassifier if the data is imbalanced. Also, the class imbalance ratio may changegreatly from one input stream to another, making the problem more complex.The existing solutions proposed for addressing the combined issue of classimbalance and concept drift are lacking in understating of correlation of oneproblem with the other. This work studies the association between conceptdrift and class imbalance ratio and then demonstrates how changes in classimbalance ratio along with concept drift affect the classifier’s performance.We analyzed the effect of both the issues on minority and majority classesindividually. To do this, we conducted experiments on benchmark datasetsusing state-of-the-art classifiers especially designed for data stream classification.Precision, recall, F1 score, and geometric mean were used to measure theperformance. Our findings show that when both class imbalance and conceptdrift problems occur together the performance can decrease up to 15%. Ourresults also show that the increase in the imbalance ratio can cause a 10% to15% decrease in the precision scores of both minority and majority classes.The study findings may help in designing intelligent and adaptive solutionsthat can cope with the challenges of non-stationary data streams like conceptdrift and class imbalance.展开更多
The ensemble learning model can effectively detect drift and utilize diversity to improve the performance of adapting to drift.However,local concept drift can occur in different types at different time points,causing ...The ensemble learning model can effectively detect drift and utilize diversity to improve the performance of adapting to drift.However,local concept drift can occur in different types at different time points,causing basic learners are difficult to distinguish the drift of local boundaries,and the drift range is difficult to determine.Thus,the ensemble learning model to adapt local concept drifts is still challenging problem.Moreover,there are often differences in decision boundaries after drift adaptation,and employing overall diversity measurement is inappropriate.To address these two issues,this paper proposes a novel ensemble learning model called instance-weighted ensemble learning based on the three-way decision(IWE-TWD).In IWE-TWD,a divide-and-conquer strategy is employed to handle uncertain drift and to select base learners;Density clustering dynamically constructs density regions to lock drift range;Three-way decision is adopted to estimate whether the region distribution changes,and the instance is weighted with the probability of region distribution change;The diversities between base learners are determined with three-way decision also.Experimental results show that IWE-TWD has better performance than the state-of-the-art models in data stream classification on ten synthetic data sets and seven real-world data sets.展开更多
One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are imm...One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are immense volume, high production rate, limited data processing time, and data concept drift; these features differentiate the data stream from standard types of data. An issue for the data stream is classification of input data. A novel ensemble classifier is proposed in this paper. The classifier uses base classifiers of two weighting functions under different data input conditions. In addition, a new method is used to determine drift, which emphasizes the precision of the algorithm. Another characteristic of the proposed method is removal of different numbers of the base classifiers based on their quality. Implementation of a weighting mechanism to the base classifiers at the decision-making stage is another advantage of the algorithm. This facilitates adaptability when drifts take place, which leads to classifiers with higher efficiency. Furthermore, the proposed method is tested on a set of standard data and the results confirm higher accuracy compared to available ensemble classifiers and single classifiers. In addition, in some cases the proposed classifier is faster and needs less storage space.展开更多
Cardiovascular diseases(CVDs)continue to present a leading cause ofmortalityworldwide,emphasizing the importance of early and accurate prediction.Electrocardiogram(ECG)signals,central to cardiac monitoring,have increa...Cardiovascular diseases(CVDs)continue to present a leading cause ofmortalityworldwide,emphasizing the importance of early and accurate prediction.Electrocardiogram(ECG)signals,central to cardiac monitoring,have increasingly been integratedwithDeep Learning(DL)for real-time prediction of CVDs.However,DL models are prone to performance degradation due to concept drift and to catastrophic forgetting.To address this issue,we propose a realtime CVDs prediction approach,referred to as ADWIN-GFR that combines Convolutional Neural Network(CNN)layers,for spatial feature extraction,with Gated Recurrent Units(GRU),for temporal modeling,alongside adaptive drift detection and mitigation mechanisms.The proposed approach integratesAdaptiveWindowing(ADWIN)for realtime concept drift detection,a fine-tuning strategy based on Generative Features Replay(GFR)to preserve previously acquired knowledge,and a dynamic replay buffer ensuring variance,diversity,and data distribution coverage.Extensive experiments conducted on the MIT-BIH arrhythmia dataset demonstrate that ADWIN-GFR outperforms standard fine-tuning techniques,achieving an average post-drift accuracy of 95.4%,amacro F1-score of 93.9%,and a remarkably low forgetting score of 0.9%.It also exhibits an average drift detection delay of 12 steps and achieves an adaptation gain of 17.2%.These findings underscore the potential of ADWIN-GFR for deployment in real-world cardiac monitoring systems,including wearable ECG devices and hospital-based patient monitoring platforms.展开更多
概念漂移是数据流挖掘中不可避免的难点问题,其典型特征是数据分布随时间可能发生改变.针对现有模型处理数据流分类任务时出现过拟合的问题,本文提出了一种目标解耦驱动的在线深度网络(Online Deep Network driven by Target Decoupling...概念漂移是数据流挖掘中不可避免的难点问题,其典型特征是数据分布随时间可能发生改变.针对现有模型处理数据流分类任务时出现过拟合的问题,本文提出了一种目标解耦驱动的在线深度网络(Online Deep Network driven by Target Decoupling,ODNTD).首先,该模型从历史数据流中学习一个任务未知型特征提取器,实现了对任务的无偏见表示学习,从而增强了模型的泛化能力;其次,模型利用任务特定的权重调整,使得任务未知的通用特征表示能够适应具体任务,通过这种目标任务的权重学习进一步提升了模型的适应性.实验结果表明,所提出的方法对含概念漂移的数据流有良好的泛化性能.展开更多
We applied the decision tree algorithm to learn association rules between webpage’s category(pornographic or normal) and the critical features.Based on these rules, we proposed an efficient method of filtering pornog...We applied the decision tree algorithm to learn association rules between webpage’s category(pornographic or normal) and the critical features.Based on these rules, we proposed an efficient method of filtering pornographic webpages with the following major advantages: 1) a weighted window-based technique was proposed to estimate for the condition of concept drift for the keywords found recently in pornographic webpages; 2) checking only contexts of webpages without scanning pictures; 3) an incremental learning mechanism was designed to incrementally update the pornographic keyword database.展开更多
Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred t...Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred to as concept drift,mining this data stream is a challenging problem for researchers.The majority of the existing drift detection techniques are based on classification errors,which have higher probabilities of false-positive or missed detections.To improve classification accuracy,there is a need to develop more intuitive detection techniques that can identify a great number of drifts in the data streams.This paper presents an adaptive unsupervised learning technique,an ensemble classifier based on drift detection for opinion mining and sentiment classification.To improve classification performance,this approach uses four different dissimilarity measures to determine the degree of concept drifts in the data stream.Whenever a drift is detected,the proposed method builds and adds a new classifier to the ensemble.To add a new classifier,the total number of classifiers in the ensemble is first checked if the limit is exceeded before the classifier with the least weight is removed from the ensemble.To this end,a weighting mechanism is used to calculate the weight of each classifier,which decides the contribution of each classifier in the final classification results.Several experiments were conducted on real-world datasets and the resultswere evaluated on the false positive rate,miss detection rate,and accuracy measures.The proposed method is also compared with the state-of-the-art methods,which include DDM,EDDM,and PageHinkley with support vector machine(SVM)and Naive Bayes classifiers that are frequently used in concept drift detection studies.In all cases,the results show the efficiency of our proposed method.展开更多
文摘Processes supported by process-aware information systems are subject to continuous and often subtle changes due to evolving operational,organizational,or regulatory factors.These changes,referred to as incremental concept drift,gradually alter the behavior or structure of processes,making their detection and localization a challenging task.Traditional process mining techniques frequently assume process stationarity and are limited in their ability to detect such drift,particularly from a control-flow perspective.The objective of this research is to develop an interpretable and robust framework capable of detecting and localizing incremental concept drift in event logs,with a specific emphasis on the structural evolution of control-flow semantics in processes.We propose DriftXMiner,a control-flow-aware hybrid framework that combines statistical,machine learning,and process model analysis techniques.The approach comprises three key components:(1)Cumulative Drift Scanner that tracks directional statistical deviations to detect early drift signals;(2)a Temporal Clustering and Drift-Aware Forest Ensemble(DAFE)to capture distributional and classification-level changes in process behavior;and(3)Petri net-based process model reconstruction,which enables the precise localization of structural drift using transition deviation metrics and replay fitness scores.Experimental validation on the BPI Challenge 2017 event log demonstrates that DriftXMiner effectively identifies and localizes gradual and incremental process drift over time.The framework achieves a detection accuracy of 92.5%,a localization precision of 90.3%,and an F1-score of 0.91,outperforming competitive baselines such as CUSUM+Histograms and ADWIN+Alpha Miner.Visual analyses further confirm that identified drift points align with transitions in control-flow models and behavioral cluster structures.DriftXMiner offers a novel and interpretable solution for incremental concept drift detection and localization in dynamic,process-aware systems.By integrating statistical signal accumulation,temporal behavior profiling,and structural process mining,the framework enables finegrained drift explanation and supports adaptive process intelligence in evolving environments.Its modular architecture supports extension to streaming data and real-time monitoring contexts.
基金supported by U.S.Office of Naval Research(ONR)Grant number G2A62826.
文摘Purpose:This research addresses the challenge of concept drift in AI-enabled software,particularly within autonomous vehicle systems where concept drift in object recognition(like pedestrian detection)can lead to misclassifications and safety risks.This study introduces a proactive framework to detect early signs of domain-specific concept drift by leveraging domain analysis and natural language processing techniques.This method is designed to help maintain the relevance of domain knowledge and prevent potential failures in AI systems due to evolving concept definitions.Design/methodology/approach:The proposed framework integrates natural language processing and image analysis to continuously update and monitor key domain concepts against evolving external data sources,such as social media and news.By identifying terms and features closely associated with core concepts,the system anticipates and flags significant changes.This was tested in the automotive domain on the pedestrian concept,where the framework was evaluated for its capacity to detect shifts in the recognition of pedestrians,particularly during events like Halloween and specific car accidents.Findings:The framework demonstrated an ability to detect shifts in the domain concept of pedestrians,as evidenced by contextual changes around major events.While it successfully identified pedestrian-related drift,the system’s accuracy varied when overlapping with larger social events.The results indicate the model’s potential to foresee relevant shifts before they impact autonomous systems,although further refinement is needed to handle high-impact concurrent events.Research limitations:This study focused on detecting concept drift in the pedestrian domain within autonomous vehicles,with results varying across domains.To assess generalizability,we tested the framework for airplane-related incidents and demonstrated adaptability.However,unpredictable events and data biases from social media and news may obscure domain-specific drifts.Further evaluation across diverse applications is needed to enhance robustness in evolving AI environments.Practical implications:The proactive detection of concept drift has significant implications for AI-driven domains,especially in safety-critical applications like autonomous driving.By identifying early signs of drift,this framework provides actionable insights for AI system updates,potentially reducing misclassification risks and enhancing public safety.Moreover,it enables timely interventions,reducing costly and labor-intensive retraining requirements by focusing only on the relevant aspects of evolving concepts.This method offers a streamlined approach for maintaining AI system performance in environments where domain knowledge rapidly changes.Originality/value:This study contributes a novel domain-agnostic framework that combines natural language processing with image analysis to predict concept drift early.This unique approach,which is focused on real-time data sources,offers an effective and scalable solution for addressing the evolving nature of domain-specific concepts in AI applications.
基金supported by 2023 Teaching Research Project of the Education Department of Anhui Province:Exploration of Optimizing Teaching Strategies for Embedded Courses in the Context of“New Engineering”(Project No.2023jyxm0460)2024 High-quality Course on Ideological and Political Education Integrated into Curriculum at Anhui University of Engineering:“Data Structures and Algorithms”(Project No.2024szyzk40)Industry-University-Research Cooperation Project of Anhui University of Engineering:“Online detection of surface quality defects in high-speed wire rod”(Project No.HX-2024-11-003).
文摘With the gradual penetration of the internet of things(IoT)into all areas of life,the scale of IoT devices shows an explosive growth trend.The era of internet of everything is coming,and the important position of IoT security is becoming increasingly prominent.Due to the large number types of IoT devices,there may be different security vulnerabilities,and unknown attack forms and virus samples are appear.In other words,large number of IoT devices,large data volumes,and various attack forms pose a big challenge of malicious traffic identification.To solve these problems,this paper proposes a concept drift detection and adaptation(CDDA)method for IoT security framework.The AI model performance is evaluated by verifying the effectiveness of IoT traffic for data drift detection,so as to select the best AI model.The experimental test are given to confirm that the feasibility of the framework and the adaptive method in practice,and the effect on the performance of IoT traffic identification is also verified.
文摘Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively.
基金The Taif University Deanship of Scientific Research supported this endeavor(Project Number:1-443-4)for which the authors are grateful to Taif University for their kind support.
文摘Concept drift is a main security issue that has to be resolved since it presents a significant barrier to the deployment of machine learning(ML)models.Due to attackers’(and/or benign equivalents’)dynamic behavior changes,testing data distribution frequently diverges from original training data over time,resulting in substantial model failures.Due to their dispersed and dynamic nature,distributed denial-of-service attacks pose a danger to cybersecurity,resulting in attacks with serious consequences for users and businesses.This paper proposes a novel design for concept drift analysis and detection of malware attacks like Distributed Denial of Service(DDOS)in the network.The goal of this architecture combination is to accurately represent data and create an effective cyber security prediction agent.The intrusion detection system and concept drift of the network has been analyzed using secure adaptive windowing with website data authentication protocol(SAW_WDA).The network has been analyzed by authentication protocol to avoid malware attacks.The data of network users will be collected and classified using multilayer perceptron gradient decision tree(MLPGDT)classifiers.Based on the classification output,the decision for the detection of attackers and authorized users will be identified.The experimental results show output based on intrusion detection and concept drift analysis systems in terms of throughput,end-end delay,network security,network concept drift,and results based on classification with regard to accuracy,memory,and precision and F-1 score.
基金The authors would like to extend their gratitude to Universiti Teknologi PETRONAS (Malaysia)for funding this research through grant number (015LA0-037).
文摘Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes overtime, leading to class imbalance and concept drift issues. Both these issuescause model performance degradation. Most of the current work has beenfocused on developing an ensemble strategy by training a new classifier on thelatest data to resolve the issue. These techniques suffer while training the newclassifier if the data is imbalanced. Also, the class imbalance ratio may changegreatly from one input stream to another, making the problem more complex.The existing solutions proposed for addressing the combined issue of classimbalance and concept drift are lacking in understating of correlation of oneproblem with the other. This work studies the association between conceptdrift and class imbalance ratio and then demonstrates how changes in classimbalance ratio along with concept drift affect the classifier’s performance.We analyzed the effect of both the issues on minority and majority classesindividually. To do this, we conducted experiments on benchmark datasetsusing state-of-the-art classifiers especially designed for data stream classification.Precision, recall, F1 score, and geometric mean were used to measure theperformance. Our findings show that when both class imbalance and conceptdrift problems occur together the performance can decrease up to 15%. Ourresults also show that the increase in the imbalance ratio can cause a 10% to15% decrease in the precision scores of both minority and majority classes.The study findings may help in designing intelligent and adaptive solutionsthat can cope with the challenges of non-stationary data streams like conceptdrift and class imbalance.
基金supported by the Postdoctoral Fellowship Program of CPSF(No.GZB20230092)the China Postdoctoral Science Foundation(No.2023M740383)the Natural Science Foundation of Sichuan Province(No.24NSFSC1654).
文摘The ensemble learning model can effectively detect drift and utilize diversity to improve the performance of adapting to drift.However,local concept drift can occur in different types at different time points,causing basic learners are difficult to distinguish the drift of local boundaries,and the drift range is difficult to determine.Thus,the ensemble learning model to adapt local concept drifts is still challenging problem.Moreover,there are often differences in decision boundaries after drift adaptation,and employing overall diversity measurement is inappropriate.To address these two issues,this paper proposes a novel ensemble learning model called instance-weighted ensemble learning based on the three-way decision(IWE-TWD).In IWE-TWD,a divide-and-conquer strategy is employed to handle uncertain drift and to select base learners;Density clustering dynamically constructs density regions to lock drift range;Three-way decision is adopted to estimate whether the region distribution changes,and the instance is weighted with the probability of region distribution change;The diversities between base learners are determined with three-way decision also.Experimental results show that IWE-TWD has better performance than the state-of-the-art models in data stream classification on ten synthetic data sets and seven real-world data sets.
文摘One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are immense volume, high production rate, limited data processing time, and data concept drift; these features differentiate the data stream from standard types of data. An issue for the data stream is classification of input data. A novel ensemble classifier is proposed in this paper. The classifier uses base classifiers of two weighting functions under different data input conditions. In addition, a new method is used to determine drift, which emphasizes the precision of the algorithm. Another characteristic of the proposed method is removal of different numbers of the base classifiers based on their quality. Implementation of a weighting mechanism to the base classifiers at the decision-making stage is another advantage of the algorithm. This facilitates adaptability when drifts take place, which leads to classifiers with higher efficiency. Furthermore, the proposed method is tested on a set of standard data and the results confirm higher accuracy compared to available ensemble classifiers and single classifiers. In addition, in some cases the proposed classifier is faster and needs less storage space.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R196)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Cardiovascular diseases(CVDs)continue to present a leading cause ofmortalityworldwide,emphasizing the importance of early and accurate prediction.Electrocardiogram(ECG)signals,central to cardiac monitoring,have increasingly been integratedwithDeep Learning(DL)for real-time prediction of CVDs.However,DL models are prone to performance degradation due to concept drift and to catastrophic forgetting.To address this issue,we propose a realtime CVDs prediction approach,referred to as ADWIN-GFR that combines Convolutional Neural Network(CNN)layers,for spatial feature extraction,with Gated Recurrent Units(GRU),for temporal modeling,alongside adaptive drift detection and mitigation mechanisms.The proposed approach integratesAdaptiveWindowing(ADWIN)for realtime concept drift detection,a fine-tuning strategy based on Generative Features Replay(GFR)to preserve previously acquired knowledge,and a dynamic replay buffer ensuring variance,diversity,and data distribution coverage.Extensive experiments conducted on the MIT-BIH arrhythmia dataset demonstrate that ADWIN-GFR outperforms standard fine-tuning techniques,achieving an average post-drift accuracy of 95.4%,amacro F1-score of 93.9%,and a remarkably low forgetting score of 0.9%.It also exhibits an average drift detection delay of 12 steps and achieves an adaptation gain of 17.2%.These findings underscore the potential of ADWIN-GFR for deployment in real-world cardiac monitoring systems,including wearable ECG devices and hospital-based patient monitoring platforms.
文摘概念漂移是数据流挖掘中不可避免的难点问题,其典型特征是数据分布随时间可能发生改变.针对现有模型处理数据流分类任务时出现过拟合的问题,本文提出了一种目标解耦驱动的在线深度网络(Online Deep Network driven by Target Decoupling,ODNTD).首先,该模型从历史数据流中学习一个任务未知型特征提取器,实现了对任务的无偏见表示学习,从而增强了模型的泛化能力;其次,模型利用任务特定的权重调整,使得任务未知的通用特征表示能够适应具体任务,通过这种目标任务的权重学习进一步提升了模型的适应性.实验结果表明,所提出的方法对含概念漂移的数据流有良好的泛化性能.
基金supported by MOST under Grant No.MOST 103-2410-H-004-112
文摘We applied the decision tree algorithm to learn association rules between webpage’s category(pornographic or normal) and the critical features.Based on these rules, we proposed an efficient method of filtering pornographic webpages with the following major advantages: 1) a weighted window-based technique was proposed to estimate for the condition of concept drift for the keywords found recently in pornographic webpages; 2) checking only contexts of webpages without scanning pictures; 3) an incremental learning mechanism was designed to incrementally update the pornographic keyword database.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups(Project under Grant Number(RGP.2/49/43)).
文摘Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred to as concept drift,mining this data stream is a challenging problem for researchers.The majority of the existing drift detection techniques are based on classification errors,which have higher probabilities of false-positive or missed detections.To improve classification accuracy,there is a need to develop more intuitive detection techniques that can identify a great number of drifts in the data streams.This paper presents an adaptive unsupervised learning technique,an ensemble classifier based on drift detection for opinion mining and sentiment classification.To improve classification performance,this approach uses four different dissimilarity measures to determine the degree of concept drifts in the data stream.Whenever a drift is detected,the proposed method builds and adds a new classifier to the ensemble.To add a new classifier,the total number of classifiers in the ensemble is first checked if the limit is exceeded before the classifier with the least weight is removed from the ensemble.To this end,a weighting mechanism is used to calculate the weight of each classifier,which decides the contribution of each classifier in the final classification results.Several experiments were conducted on real-world datasets and the resultswere evaluated on the false positive rate,miss detection rate,and accuracy measures.The proposed method is also compared with the state-of-the-art methods,which include DDM,EDDM,and PageHinkley with support vector machine(SVM)and Naive Bayes classifiers that are frequently used in concept drift detection studies.In all cases,the results show the efficiency of our proposed method.