In today's connected world,the generation of massive streaming data across diverse domains has become commonplace.In the presence of concept drift,class imbalance,label scarcity,and new class emergence,these chall...In today's connected world,the generation of massive streaming data across diverse domains has become commonplace.In the presence of concept drift,class imbalance,label scarcity,and new class emergence,these challenges jointly degrade representation stability,bias learning toward outdated distributions,and reduce the resilience and reliability of detection in dynamic environments.This paper proposes a streaming classincremental learning(SCIL)framework to address these issues.The SCIL framework integrates an autoencoder(AE)with a multi-layer perceptron for multi-class prediction,employs a dual-loss strategy(classification and reconstruction)for prediction and new class detection,uses corrected pseudo-labels for online training,manages classes with queues,and applies oversampling to handle imbalance.The rationale behind the method's structure is elucidated through ablation studies,and a comprehensive experimental evaluation is performed using both real-world and synthetic datasets that feature class imbalance,incremental classes,and concept drifts.Our results demonstrate that SCIL outperforms strong baselines and state-of-the-art methods.In line with our commitment to Open Science,we make our code and datasets available to the community.展开更多
Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logi...Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logistic regression,this paper proposed an algorithm,called evolutionary logistical regression classifier(ELRClass),to solve the classification of evolving data streams.This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier,to keep this classifier if its performance is deteriorated by the reason of bursting noise,or to construct a new classifier if a major concept drift is detected.The intensive experimental results demonstrate the effectiveness of this algorithm.展开更多
The analytical capacity of massive data has become increasingly necessary, given the high volume of data that has been generated daily by different sources. The data sources are varied and can generate a huge amount o...The analytical capacity of massive data has become increasingly necessary, given the high volume of data that has been generated daily by different sources. The data sources are varied and can generate a huge amount of data, which can be processed in batch or stream settings. The stream setting corresponds to the treatment of a continuous sequence of data that arrives in real-time flow and needs to be processed in real-time. The models, tools, methods and algorithms for generating intelligence from data stream culminate in the approaches of Data Stream Mining and Data Stream Learning. The activities of such approaches can be organized and structured according to Engineering principles, thus allowing the principles of Analytical Engineering, or more specifically, Analytical Engineering for Data Stream (AEDS). Thus, this article presents the AEDS conceptual framework composed of four pillars (Data, Model, Tool, People) and three processes (Acquisition, Retention, Review). The definition of these pillars and processes is carried out based on the main components of data stream setting, corresponding to four pillars, and also on the necessity to operationalize the activities of an Analytical Organization (AO) in the use of AEDS four pillars, which determines the three proposed processes. The AEDS framework favors the projects carried out in an AO, that is, its Analytical Projects (AP), to favor the delivery of results, or Analytical Deliverables (AD), carried out by the Analytical Teams (AT) in order to provide intelligence from stream data.展开更多
基金supported by the European Research Council(ERC)under Grant Agreement No.951424(Water-Futures)by the Republic of Cyprus through the Deputy Ministry of Research,Innovation and Digital Policy.
文摘In today's connected world,the generation of massive streaming data across diverse domains has become commonplace.In the presence of concept drift,class imbalance,label scarcity,and new class emergence,these challenges jointly degrade representation stability,bias learning toward outdated distributions,and reduce the resilience and reliability of detection in dynamic environments.This paper proposes a streaming classincremental learning(SCIL)framework to address these issues.The SCIL framework integrates an autoencoder(AE)with a multi-layer perceptron for multi-class prediction,employs a dual-loss strategy(classification and reconstruction)for prediction and new class detection,uses corrected pseudo-labels for online training,manages classes with queues,and applies oversampling to handle imbalance.The rationale behind the method's structure is elucidated through ablation studies,and a comprehensive experimental evaluation is performed using both real-world and synthetic datasets that feature class imbalance,incremental classes,and concept drifts.Our results demonstrate that SCIL outperforms strong baselines and state-of-the-art methods.In line with our commitment to Open Science,we make our code and datasets available to the community.
文摘Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logistic regression,this paper proposed an algorithm,called evolutionary logistical regression classifier(ELRClass),to solve the classification of evolving data streams.This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier,to keep this classifier if its performance is deteriorated by the reason of bursting noise,or to construct a new classifier if a major concept drift is detected.The intensive experimental results demonstrate the effectiveness of this algorithm.
文摘The analytical capacity of massive data has become increasingly necessary, given the high volume of data that has been generated daily by different sources. The data sources are varied and can generate a huge amount of data, which can be processed in batch or stream settings. The stream setting corresponds to the treatment of a continuous sequence of data that arrives in real-time flow and needs to be processed in real-time. The models, tools, methods and algorithms for generating intelligence from data stream culminate in the approaches of Data Stream Mining and Data Stream Learning. The activities of such approaches can be organized and structured according to Engineering principles, thus allowing the principles of Analytical Engineering, or more specifically, Analytical Engineering for Data Stream (AEDS). Thus, this article presents the AEDS conceptual framework composed of four pillars (Data, Model, Tool, People) and three processes (Acquisition, Retention, Review). The definition of these pillars and processes is carried out based on the main components of data stream setting, corresponding to four pillars, and also on the necessity to operationalize the activities of an Analytical Organization (AO) in the use of AEDS four pillars, which determines the three proposed processes. The AEDS framework favors the projects carried out in an AO, that is, its Analytical Projects (AP), to favor the delivery of results, or Analytical Deliverables (AD), carried out by the Analytical Teams (AT) in order to provide intelligence from stream data.