Anomaly detection is a longstanding and active research area that has many applications in domains such as finance,secur-ity and manufacturing.However,the efficiency and performance of anomaly detection algorithms are...Anomaly detection is a longstanding and active research area that has many applications in domains such as finance,secur-ity and manufacturing.However,the efficiency and performance of anomaly detection algorithms are challenged by the large-scale,high-dimensional and heterogeneous data that are prevalent in the era of big data.Isolation-based unsupervised anomaly detection is a novel and effective approach for identifying anomalies in data.It relies on the idea that anomalies are few and different from normal instances,and thus can be easily isolated by random partitioning.Isolation-based methods have several advantages over existing methods,such as low computational complexity,low memory usage,high scalability,robustness to noise and irrelevant features,and no need for prior knowledge or heavy parameter tuning.In this survey,we review the state-of-the-art isolation-based anomaly detection methods,includ-ing their data partitioning strategies,anomaly score functions,and algorithmic details.We also discuss some extensions and applica-tions of isolation-based methods in different scenarios,such as detecting anomalies in streaming data,time series,trajectory and image datasets.Finally,we identify some open challenges and future directions for isolation-based anomaly detection research.展开更多
基金supported by the National Natural Science Foundation of China(No.62076120)supported by the State Key Laboratory for Novel Software Technology at Nanjing University,China(No.KFKT2024A01)Open Access funding enabled and organized by CAUL and its Member Institutions。
文摘Anomaly detection is a longstanding and active research area that has many applications in domains such as finance,secur-ity and manufacturing.However,the efficiency and performance of anomaly detection algorithms are challenged by the large-scale,high-dimensional and heterogeneous data that are prevalent in the era of big data.Isolation-based unsupervised anomaly detection is a novel and effective approach for identifying anomalies in data.It relies on the idea that anomalies are few and different from normal instances,and thus can be easily isolated by random partitioning.Isolation-based methods have several advantages over existing methods,such as low computational complexity,low memory usage,high scalability,robustness to noise and irrelevant features,and no need for prior knowledge or heavy parameter tuning.In this survey,we review the state-of-the-art isolation-based anomaly detection methods,includ-ing their data partitioning strategies,anomaly score functions,and algorithmic details.We also discuss some extensions and applica-tions of isolation-based methods in different scenarios,such as detecting anomalies in streaming data,time series,trajectory and image datasets.Finally,we identify some open challenges and future directions for isolation-based anomaly detection research.