期刊文献+
共找到77篇文章
< 1 2 4 >
每页显示 20 50 100
Enhancing IoT Resilience at the Edge:A Resource-Efficient Framework for Real-Time Anomaly Detection in Streaming Data
1
作者 Kirubavathi G. Arjun Pulliyasseri +5 位作者 Aswathi Rajesh Amal Ajayan Sultan Alfarhood Mejdl Safran Meshal Alfarhood Jungpil Shin 《Computer Modeling in Engineering & Sciences》 2025年第6期3005-3031,共27页
The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability... The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability,operational efficiency,and security depends on the identification of anomalies in these dynamic and resource-constrained systems.Due to their high computational requirements and inability to efficiently process continuous data streams,traditional anomaly detection techniques often fail in IoT systems.This work presents a resource-efficient adaptive anomaly detection model for real-time streaming data in IoT systems.Extensive experiments were carried out on multiple real-world datasets,achieving an average accuracy score of 96.06%with an execution time close to 7.5 milliseconds for each individual streaming data point,demonstrating its potential for real-time,resourceconstrained applications.The model uses Principal Component Analysis(PCA)for dimensionality reduction and a Z-score technique for anomaly detection.It maintains a low computational footprint with a sliding window mechanism,enabling incremental data processing and identification of both transient and sustained anomalies without storing historical data.The system uses a Multivariate Linear Regression(MLR)based imputation technique that estimates missing or corrupted sensor values,preserving data integrity prior to anomaly detection.The suggested solution is appropriate for many uses in smart cities,industrial automation,environmental monitoring,IoT security,and intelligent transportation systems,and is particularly well-suited for resource-constrained edge devices. 展开更多
关键词 Anomaly detection streaming data IOT IIoT TMoT REAL-TIME LIGHTWEIGHT modeling
在线阅读 下载PDF
Modeling and Performance Evaluation of Streaming Data Processing System in IoT Architecture
2
作者 Feng Zhu Kailin Wu Jie Ding 《Computers, Materials & Continua》 2025年第5期2573-2598,共26页
With the widespread application of Internet of Things(IoT)technology,the processing of massive realtime streaming data poses significant challenges to the computational and data-processing capabilities of systems.Alth... With the widespread application of Internet of Things(IoT)technology,the processing of massive realtime streaming data poses significant challenges to the computational and data-processing capabilities of systems.Although distributed streaming data processing frameworks such asApache Flink andApache Spark Streaming provide solutions,meeting stringent response time requirements while ensuring high throughput and resource utilization remains an urgent problem.To address this,the study proposes a formal modeling approach based on Performance Evaluation Process Algebra(PEPA),which abstracts the core components and interactions of cloud-based distributed streaming data processing systems.Additionally,a generic service flow generation algorithmis introduced,enabling the automatic extraction of service flows fromthe PEPAmodel and the computation of key performance metrics,including response time,throughput,and resource utilization.The novelty of this work lies in the integration of PEPA-based formal modeling with the service flow generation algorithm,bridging the gap between formal modeling and practical performance evaluation for IoT systems.Simulation experiments demonstrate that optimizing the execution efficiency of components can significantly improve system performance.For instance,increasing the task execution rate from 10 to 100 improves system performance by 9.53%,while further increasing it to 200 results in a 21.58%improvement.However,diminishing returns are observed when the execution rate reaches 500,with only a 0.42%gain.Similarly,increasing the number of TaskManagers from 10 to 20 improves response time by 18.49%,but the improvement slows to 6.06% when increasing from 20 to 50,highlighting the importance of co-optimizing component efficiency and resource management to achieve substantial performance gains.This study provides a systematic framework for analyzing and optimizing the performance of IoT systems for large-scale real-time streaming data processing.The proposed approach not only identifies performance bottlenecks but also offers insights into improving system efficiency under different configurations and workloads. 展开更多
关键词 System modeling performance evaluation streaming data process IoT system PEPA
在线阅读 下载PDF
An Efficient Modelling of Oversampling with Optimal Deep Learning Enabled Anomaly Detection in Streaming Data 被引量:2
3
作者 R.Rajakumar S.Sathiya Devi 《China Communications》 SCIE CSCD 2024年第5期249-260,共12页
Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL... Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets. 展开更多
关键词 anomaly detection deep learning hyperparameter optimization OVERSAMPLING SMOTE streaming data
在线阅读 下载PDF
An Optimal Big Data Analytics with Concept Drift Detection on High-Dimensional Streaming Data 被引量:1
4
作者 Romany F.Mansour Shaha Al-Otaibi +3 位作者 Amal Al-Rasheed Hanan Aljuaid Irina V.Pustokhina Denis A.Pustokhin 《Computers, Materials & Continua》 SCIE EI 2021年第9期2843-2858,共16页
Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directl... Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively. 展开更多
关键词 streaming data concept drift classification model deep learning class imbalance data
在线阅读 下载PDF
A Novel Outlier Detection with Feature Selection Enabled Streaming Data Classification
5
作者 R.Rajakumar S.Sathiya Devi 《Intelligent Automation & Soft Computing》 SCIE 2023年第2期2101-2116,共16页
Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approach... Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approaches to address regression,prediction,and classification problems have received consid-erable interest.At the same time,the detection of anomalies or outliers and feature selection(FS)processes becomes important.This study develops an outlier detec-tion with feature selection technique for streaming data classification,named ODFST-SDC technique.Initially,streaming data is pre-processed in two ways namely categorical encoding and null value removal.In addition,Local Correla-tion Integral(LOCI)is used which is significant in the detection and removal of outliers.Besides,red deer algorithm(RDA)based FS approach is employed to derive an optimal subset of features.Finally,kernel extreme learning machine(KELM)classifier is used for streaming data classification.The design of LOCI based outlier detection and RDA based FS shows the novelty of the work.In order to assess the classification outcomes of the ODFST-SDC technique,a series of simulations were performed using three benchmark datasets.The experimental results reported the promising outcomes of the ODFST-SDC technique over the recent approaches. 展开更多
关键词 streaming data classification outlier removal feature selection machine learning metaheuristics
在线阅读 下载PDF
Research and Simulation on the Small-scale Streaming Data Transmission Communication System based on ARM and FPGA
6
作者 Yuzhu Ren 《International Journal of Technology Management》 2016年第10期72-74,共3页
In this paper, we conduct theoretical research on small-scale streaming data transmission communication system based on ARM and FPGA. Compared with network layer IP multicast, it does not need to change the underlying... In this paper, we conduct theoretical research on small-scale streaming data transmission communication system based on ARM and FPGA. Compared with network layer IP multicast, it does not need to change the underlying structure of the network with the realization. Aa are day by day mature as a result of the embedded technical high speed development and the GPRS technology with use the embedded system essence small, special-purpose strong, the system simplification and the GPRS network cover characteristics and so on whole world. Under this basis, this paper proposes the new ARM and FPGA based small-scale streaming data transmission communication system. The implementation of the system proves its effectiveness. 展开更多
关键词 ARM and FPGA SMALL-SCALE streaming data Communication System.
在线阅读 下载PDF
Tracking Nonstationary Streaming Data via Exponentially Weighted Moving Average Stochastic Gradient Descent
7
作者 QIAN Chengde JIANG Haiyan LIANG Decai 《Journal of Systems Science & Complexity》 2025年第5期2084-2107,共24页
In many applications involving data streams,the sequences of data arise from highly dynamic and often unstable real-life processes,rendering untenable the standard assumption that current and future data come from the... In many applications involving data streams,the sequences of data arise from highly dynamic and often unstable real-life processes,rendering untenable the standard assumption that current and future data come from the same distribution.In response,new methodologies,such as dynamic online learning,have been proposed in order to account for the nonstationary features in the datagenerating process.Motivated by the stability and statistical efficiency of the notable stochastic approximation method,average stochastic gradient descent(ASGD)in time-invariant systems,the authors propose an exponentially weighted moving average(EWMA)-based stochastic gradient descent(SGD)which accommodates the dynamic structure by introducing a forgetting factor and replacing the simple averaging step in ASGD with an EWMA step.Provided that the dynamic drift is Lipschitz continuous,the mean squared tracking error rate of the proposed method achieves the optimal rate in the nonparametric statistical paradigm.The proposed framework also allows us to derive the dynamic regret bound and asymptotic normality with a path variation constraint in a natural manner.Numerical analysis has been conducted to verify the performance of the proposed method.In particular,the proposed method is much more robust to the selection of learning rates compared with the ordinary SGD method. 展开更多
关键词 Moving average online gradient descent streaming data varying coefficient model
原文传递
How many probe vehicles are enough for identifying traffic congestion?--a study from a streaming data perspective 被引量:2
8
作者 Handong WANG Yang YUE Qingquan LI 《Frontiers of Earth Science》 SCIE CAS CSCD 2013年第1期34-42,共9页
Many studies have been carried out using vehicle trajectory to analyze traffic conditions, for instance, identifying traffic congestion. However, there is a lack of a systematic study on the appropriate number of prob... Many studies have been carried out using vehicle trajectory to analyze traffic conditions, for instance, identifying traffic congestion. However, there is a lack of a systematic study on the appropriate number of probe vehicles and their sampling interval in order to identify traffic congestion accurately. Moreover, most of related studies ignore the streaming feature of trajectory data. This paper first represents a novel method of identifying traffic congestion considering the stream feature of vehicle trajectories. Instead of processing the whole data stream, a series of snapshots are extracted. Congested road segments can be identified by analyzing the clusters' evolution among a series of adjacent snapshots. We then calculated a series of parameters and their corresponding congestion identification accuracy. The results have implications for related probe vehicle deployment and traffic analysis; for example, when 5% of probe vehicles are available, 85% identification accuracy can be reached if the sampling time interval is 10 s. 展开更多
关键词 vehicle streaming data traffic trajectory data floating car data CONGESTION
原文传递
Super point detection based on sampling and data streaming algorithms
9
作者 程光 强士卿 《Journal of Southeast University(English Edition)》 EI CAS 2009年第2期224-227,共4页
In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and... In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and proves that only sources or destinations with a lot of flows can be sampled probabilistically using the SDSD algorithm. The SDSD algorithm uses both the IP table and the flow bloom filter (BF) data structures to maintain the IP and flow information. The IP table is used to judge whether an IP address has been recorded. If the IP exists, then all its subsequent flows will be recorded into the flow BF; otherwise, the IP flow is sampled. This paper also analyzes the accuracy and memory requirements of the SDSD algorithm , and tests them using the CERNET trace. The theoretical analysis and experimental tests demonstrate that the most relative errors of the super points estimated by the SDSD algorithm are less than 5%, whereas the results of other algorithms are about 10%. Because of the BF structure, the SDSD algorithm is also better than previous algorithms in terms of memory consumption. 展开更多
关键词 super point flow sampling data streaming
在线阅读 下载PDF
Improved Data Stream Clustering Method: Incorporating KD-Tree for Typicality and Eccentricity-Based Approach
10
作者 Dayu Xu Jiaming Lu +1 位作者 Xuyao Zhang Hongtao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第2期2557-2573,共17页
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims... Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research. 展开更多
关键词 data stream clustering TEDA KD-TREE scapegoat tree
在线阅读 下载PDF
Clustering algorithm for multiple data streams based on spectral component similarity 被引量:1
11
作者 邹凌君 陈崚 屠莉 《Journal of Southeast University(English Edition)》 EI CAS 2008年第3期264-266,共3页
A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR... A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods. 展开更多
关键词 data streams CLUSTERING AR model spectral component
在线阅读 下载PDF
Data partitioning based on sampling for power load streams
12
作者 王永利 徐宏炳 +2 位作者 董逸生 钱江波 刘学军 《Journal of Southeast University(English Edition)》 EI CAS 2005年第3期293-298,共6页
A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,wh... A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing. 展开更多
关键词 data streams continuous queries parallel processing sampling data partitioning
在线阅读 下载PDF
Min-wise hash function-based sampling over distributed data streams
13
作者 崇志宏 倪巍伟 +2 位作者 徐立臻 吕建华 谢英豪 《Journal of Southeast University(English Edition)》 EI CAS 2009年第4期456-459,共4页
In order to avoid the redundant and inconsistent information in distributed data streams, a sampling method based on min-wise hash functions is designed and the practical semantics of the union of distributed data str... In order to avoid the redundant and inconsistent information in distributed data streams, a sampling method based on min-wise hash functions is designed and the practical semantics of the union of distributed data streams is defined. First, for each family of min-wise hash functions, the data with the minimum hash value are selected as local samples and the biased effect caused by frequent updates in a single data stream is filtered out. Secondly, for the same hash function, the sample with the minimum hash value is selected as the global sample and the local samples are combined at the center node to filter out the biased effect of duplicated updates. Finally, based on the obtained uniform samples, several aggregations on the defined semantics of the union of data streams are precisely estimated. The results of comparison tests on synthetic and real-life data streams demonstrate the effectiveness of this method. 展开更多
关键词 data streams AGGREGATION rain-wise hashing
在线阅读 下载PDF
N⁃DD: New Approach for Drift Detection Based on Neutrosophic Support Vector Machine
14
作者 Rania Lutfi 《Journal of Harbin Institute of Technology(New Series)》 2025年第3期82-90,共9页
Many real⁃world machine learning applications face the challenge of dealing with changing data over time,known as concept drift,and the issue of data indeterminacy,where all the true labels available are unrealistic.T... Many real⁃world machine learning applications face the challenge of dealing with changing data over time,known as concept drift,and the issue of data indeterminacy,where all the true labels available are unrealistic.This can lead to a decrease in the accuracy of the prediction models.The aim of this study is to introduce a new approach for detecting drift,which is based on neutrosophic set theory.This approach takes into account uncertainty in the prediction model and is able to handle indeterminate information,considering its impact on the models performance.The proposed method reads data into windows and calculates a set of values based on the concept of neutrosophic membership.These values are then used in the Neutrosophic Support Vector Machine(N⁃SVM).To address the issue of indeterminate true label data,the values issued by N⁃SVM are expressed as entropy and used as input for the ADWIN(Adaptive Windowing)change detector.When a drift is detected,the prediction model is retrained by including only the most recent instances with the original training data set.The proposed method gives promising results in terms of drift detection accuracy compared to the state of existing drift detection methods such as KSWIN,ADWIN,and DWM. 展开更多
关键词 drift detection indeterminate labels UNCERTAINTY neutrosophic set theory data stream
在线阅读 下载PDF
Design and Application of a New Distributed Dynamic Spatio-Temporal Privacy Preserving Mechanisms
15
作者 Jiacheng Xiong Xingshu Chen +1 位作者 Xiao Lan Liangguo Chen 《Computers, Materials & Continua》 2025年第8期2273-2303,共31页
In the era of big data,the growing number of real-time data streams often contains a lot of sensitive privacy information.Releasing or sharing this data directly without processing will lead to serious privacy informa... In the era of big data,the growing number of real-time data streams often contains a lot of sensitive privacy information.Releasing or sharing this data directly without processing will lead to serious privacy information leakage.This poses a great challenge to conventional privacy protection mechanisms(CPPM).The existing data partitioning methods ignore the number of data replications and information exchanges,resulting in complex distance calculations and inefficient indexing for high-dimensional data.Therefore,CPPM often fails to meet the stringent requirements of efficiency and reliability,especially in dynamic spatiotemporal environments.Addressing this concern,we proposed the Principal Component Enhanced Vantage-point tree(PEV-Tree),which is an enhanced data structure based on the idea of dimension reduction,and constructed a Distributed Spatio-Temporal Privacy Preservation Mechanism(DST-PPM)on it.In this work,principal component analysis and the vantage tree are used to establish the PEV-Tree.In addition,we designed three distributed anonymization algorithms for data streams.These algorithms are named CK-AA,CL-DA,and CT-CA,fulfill the anonymization rules of K-Anonymity,L-Diversity,and T-Closeness,respectively,which have different computational complexities and reliabilities.The higher the complexity,the lower the risk of privacy leakage.DST-PPM can reduce the dimension of high-dimensional information while preserving data characteristics and dividing the data space into vantage points based on distance.It effectively enhances the data processing workflow and increases algorithmefficiency.To verify the validity of the method in this paper,we conducted empirical tests of CK-AA,CL-DA,and CT-CA on conventional datasets and the PEV-Tree,respectively.Based on the big data background of the Internet of Vehicles,we conducted experiments using artificial simulated on-board network data.The results demonstrated that the operational efficiency of the CK-AA,CL-DA,and CT-CA is enhanced by 15.12%,24.55%,and 52.74%,respectively,when deployed on the PEV-Tree.Simultaneously,during homogeneity attacks,the probabilities of information leakage were reduced by 2.31%,1.76%,and 0.19%,respectively.Furthermore,these algorithms showcased superior utility(scalability)when executed across PEV-Trees of varying scales in comparison to their performance on conventional data structures.It indicates that DST-PPM offers marked advantages over CPPM in terms of efficiency,reliability,and scalability. 展开更多
关键词 Privacy preserving distributed anonymization algorithm VP-Tree data stream internet of vehicles
在线阅读 下载PDF
Fast wireless sensor for anomaly detection based on data stream in an edge-computing-enabled smart greenhouse 被引量:5
16
作者 Yihong Yang Sheng Ding +4 位作者 Yuwen Liu Shunmei Meng Xiaoxiao Chi Rui Ma Chao Yan 《Digital Communications and Networks》 SCIE CSCD 2022年第4期498-507,共10页
Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute ... Edge-computing-enabled smart greenhouses are a representative application of the Internet of Things(IoT)technology,which can monitor the environmental information in real-time and employ the information to contribute to intelligent decision-making.In the process,anomaly detection for wireless sensor data plays an important role.However,the traditional anomaly detection algorithms originally designed for anomaly detection in static data do not properly consider the inherent characteristics of the data stream produced by wireless sensors such as infiniteness,correlations,and concept drift,which may pose a considerable challenge to anomaly detection based on data stream and lead to low detection accuracy and efficiency.First,the data stream is usually generated quickly,which means that the data stream is infinite and enormous.Hence,any traditional off-line anomaly detection algorithm that attempts to store the whole dataset or to scan the dataset multiple times for anomaly detection will run out of memory space.Second,there exist correlations among different data streams,and traditional algorithms hardly consider these correlations.Third,the underlying data generation process or distribution may change over time.Thus,traditional anomaly detection algorithms with no model update will lose their effects.Considering these issues,a novel method(called DLSHiForest)based on Locality-Sensitive Hashing and the time window technique is proposed to solve these problems while achieving accurate and efficient detection.Comprehensive experiments are executed using a real-world agricultural greenhouse dataset to demonstrate the feasibility of our approach.Experimental results show that our proposal is practical for addressing the challenges of traditional anomaly detection while ensuring accuracy and efficiency. 展开更多
关键词 Anomaly detection data stream DLSHiForest Smart greenhouse Edge computing
在线阅读 下载PDF
Dynamically Computing Approximate Frequency Counts in Sliding Window over Data Stream 被引量:1
17
作者 NIE Guo-liang LU Zheng-ding 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期283-288,共6页
This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constru... This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constructs subwindows and deletes expired sub-windows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most 1/ε + 1 elements for frequency queries over the most recent N elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent N elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent n ( n≤N) elements. The second algorithm outputs at most 1/ε + 2 elements. The analytical and experimental results show that our algorithms are accurate and effective. 展开更多
关键词 data stream sliding window approximation algorithms frequency counts
在线阅读 下载PDF
THRFuzzy:Tangential holoentropy-enabled rough fuzzy classifier to classification of evolving data streams 被引量:1
18
作者 Jagannath E.Nalavade T.Senthil Murugan 《Journal of Central South University》 SCIE EI CAS CSCD 2017年第8期1789-1800,共12页
The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is conside... The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers. 展开更多
关键词 data stream classification fuzzy rough set tangential holoentropy concept change
在线阅读 下载PDF
Subspace Clustering in High-Dimensional Data Streams:A Systematic Literature Review
19
作者 Nur Laila Ab Ghani Izzatdin Abdul Aziz Said Jadid AbdulKadir 《Computers, Materials & Continua》 SCIE EI 2023年第5期4649-4668,共20页
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac... Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams. 展开更多
关键词 CLUSTERING subspace clustering projected clustering data stream stream clustering high dimensionality evolving data stream concept drift
在线阅读 下载PDF
Monitoring correlative financial data streams by local pattern similarity
20
作者 Tao JIANG Yu-cai FENG +3 位作者 Bin ZHANG Zhong-sheng CAO Ge FU Jie SHI 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2009年第7期937-951,共15页
Developing tools for monitoring the correlations among thousands of financial data streams in an online fashion can be interesting and useful work. We aimed to find highly correlative financial data streams in local p... Developing tools for monitoring the correlations among thousands of financial data streams in an online fashion can be interesting and useful work. We aimed to find highly correlative financial data streams in local patterns. A novel distance metric function slope duration distance (SDD) is proposed, which is compatible with the characteristics of actual financial data streams. Moreover, a model monitoring correlations among local patterns (MCALP) is presented, which dramatically decreases the computational cost using an algorithm quickly online segmenting and pruning (QONSP) with O(1) time cost at each time tick t, and our proposed new grid structure. Experimental results showed that MCALP provides an improvement of several orders of magnitude in performance relative to traditional naive linear scan techniques and maintains high precision. Furthermore, the model is incremental, parallelizable, and has a quick response time. 展开更多
关键词 data mining Model data streams Correlation Local pattern Pattern similarity
原文传递
上一页 1 2 4 下一页 到第
使用帮助 返回顶部