基于张量分解的数据流异常检测

Anomaly Detection in Data Streams Based on Tensors Analysis

下载PDF

导出

摘要本文对基于分布式的演化数据流的连续异常检测问题进行了形式化描述,提出一种在滑动窗口中基于张量分解的异常检测算法——WSTA。该算法将各分布结点上的数据流作为全局数据流的子张量,通过分布结点与中心节点的通信,在分布结点的滑动窗口中自适应抽样生成概要数据结构矩阵。对该数据矩阵进行张量分解得到特征向量,然后采用基于距离的异常检测方法发现异常点。基于大量真实数据集的实验表明,此算法具有良好的适用性和可扩展性。 This paper formalizes the problem of continuous anomaly detection over distributed evolving data streams. A novel anomaly detection algorithm of tensor analysis over the sliding window of the distributed streams is presented, which is named WSTA. The data stream on every distributed node is taken for a sub-tensor of the global data stream, based on the communication of distribution information between the distributed nodes and the central node, and can produce the synopsis data structure matrix through adaptive sampling on every distributed node＇s sliding window. The tensor decomposition is used to extract the distribution feature of the sliced data. Then anomaly can be found by using the distance-based anomaly detection method. Our experiments with synthetic data show that the proposed method is both efficient and scalability compared with the existing anomaly detection algorithms.

作者朱雪玲兰军李寿其贾焰

机构地区国防科技大学人文与社会科学学院国防科技大学计算机学院

出处《计算机工程与科学》 CSCD 北大核心 2009年第6期75-78,共4页 Computer Engineering & Science

基金 985工程二期项目

关键词异常检测分布数据流滑动窗口张量分解自适应抽样 anomaly detection distributed data streams sliding window tensor decomposition adaptive sampling

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献5

1宋国杰,唐世渭,杨冬青,王腾蛟.数据流中异常模式的提取与趋势监测[J].计算机研究与发展,2004,41(10):1754-1759. 被引量：19
2遇辉,马秀莉,谭少华,唐世渭,杨冬青.基于奇异值分解的异常切片挖掘[J].软件学报,2005,16(7):1282-1288. 被引量：6
3Mahoney M W, Maggioni M, Drineas P. Tensor-CUR Decompositions for Tensor-Based Data[C]//Proc of KDD' 06, 2006:327 336.
4Knorr E M, Ng R T. Algorithms for Mining Distance-Based Outliers in Large Datasets[C]//Proc of VLDB'98,1998 : 392- 403.
5Ghoting A, Parthasarathy S, Otey M E. Fast Mining of Distance-Based Outliers in High Dimensional Datasets[J].Data Mining and Knowledge 2008,16(3) : 349-364.

二级参考文献14

1Rakesh Agrawal, Ramakrishnan Srikant. Fast algorithms for mining association rules. The 20th Int' l Conf on Very Large Data Bases, Santiago, Chile, 1994
2J Han, J Pei, Y Yin. Mining frequent Patterns without candidate generation. In: Proc of the 2000 ACM SIGMOD Int'l Conf on Management of Data. New York: ACM Press, 2000
3Ramakrishnan Srikant, Rakesh Agrawal. Mining sequential patterns: Generalizations and performance improvements. In:Peter M GApers, Mokrane Bouzeghoub, Georges Gardarin, eds.In: Proc of the 5th Int'l Conf Extending Database Technology,LNCS 1057. Berlin: Springer-Verlag, 1996. 3～17
4J Pei, J Han, B Mortazavi-Asl, et al. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth.The 2001 Int'l Conf on Data Engineering (ICDE' 01 ),Heidelberg, Germany, 2001
5Imielinski T, Khachiyan L, Abdulghani A. Cubegrades: Generalizing association rules. In: Proc. of the 8th Int'l Conf. on Data Mining and Knowledge Discovery. Edmonton: ACM Press, 2002. 219-257.
6Lakshmanan VS, Pei J, Han JW. Quotient cube: How to summarize the semantics of a data cube. In: Proc. of the 28th Int'l Conf. on Very Large Data Bases. Hong Kong: Morgan Kaufmann Publishers, 2002. 778-789.
7Sarawagi S, Agrawal R, Megiddo N. Discovery-Driven exploration of OLAP data cubes. In: Proc. of the Int'l Conf. on Extending Database Technology. LNCS 1377, Springer-Verlag, 1998. 168-182.
8Sarawagi S. Explaining differences in multidimensional aggregates. In: Proc. of the 25th Int'l Conf. on Very Large Data Bases. Edinburgh: Morgan Kaufmann Publishers, 1999. 42-53.
9Sarawagi S. User-Adaptive exploration of multidimensional data. In: Proc. of the 26th Int'l Conf. on Very Large Data Bases. Cairo: Morgan Kaufmann Publishers, 2000. 307-316.
10Sathe G, Sarawagi S. Intelligent rollups in multidimensional OLAP data. In: Proc. of the 27th Int'l Conf. on Very Large Data Bases. Roma: Morgan Kaufmann Publishers, 2001. 531-540.

共引文献23

1姜力争,杨冬青,唐世渭,马秀莉,张德辉.数据立方体切片的核心聚类分析方法[J].计算机研究与发展,2006,43(z3):359-365.
2张引,陈敏,廖小飞.大数据应用的现状与展望[J].计算机研究与发展,2013,50(S2):216-233. 被引量：391
3谭华.流式数据挖掘方法下的汇率行为预测方法探讨[J].湘南学院学报,2010,31(4):27-29. 被引量：1
4程国达,赵文彦,宣恒农.基于基区间的随机滑动窗口聚集[J].计算机应用,2006,26(2):360-363. 被引量：1
5刘学军,徐宏炳,董逸生,钱江波,王永利.基于最大频繁项集信息熵的数据流变化检测[J].应用科学学报,2006,24(5):498-502. 被引量：1
6田祥宏,陈爱萍.一种自适应数据流值预测模型[J].福建电脑,2006,22(12):112-113.
7刘琦,张引,叶修梓,俞荣栋.基于奇异值分解的RNA二级结构相似度计算方法[J].浙江大学学报（工学版）,2007,41(8):1249-1254.
8刘耀宗,王湛,张宏,刘凤玉.数据流的预测与分类研究[J].计算机科学,2007,34(11):170-173. 被引量：2
9胡雪艳,苏亮,高春鸣.演化数据流上的连续异常检测[J].计算机工程与应用,2008,44(7):174-178.
10孔英会,吕云洁,吕云清.改进的基于移动小波树的数据流异常检测方法[J].华北电力大学学报（自然科学版）,2009,36(4):67-72. 被引量：3

1王树广.分布式数据流上的连续异常检测[J].微电子学与计算机,2008,25(9):158-160. 被引量：1
2胡雪艳,苏亮,高春鸣.演化数据流上的连续异常检测[J].计算机工程与应用,2008,44(7):174-178.
3杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测[J].计算机研究与发展,2005,42(9):1498-1504. 被引量：9
4李炎,李皓,钱肖鲁,朱扬勇.异常检测算法分析[J].计算机工程,2002,28(6):5-6. 被引量：21
5李晟,刘嘉.网络流量测量的自适应抽样方法研究[J].苏州科技学院学报（自然科学版）,2008,25(1):71-75. 被引量：2
6要趁红,王民,宋文博.一种自适应采样方法在网络管理中的研究[J].信息通信,2012,25(3):141-142.
7王丹.数据流概要数据的合并性研究分析[J].无线互联科技,2013,10(11):95-95.
8刘元珍.基于CBF的自适应抽样算法研究[J].科技信息,2009(24):71-71.
9王程华,江峰.粗糙集理论中基于距离的异常检测[J].烟台大学学报（自然科学与工程版）,2010,23(1):54-58. 被引量：1
10危美林,张明清,董书琴,李海龙,齐先庆.面向异常流量检测的自适应抽样算法研究[J].计算机应用研究,2015,32(10):3052-3055. 被引量：4

计算机工程与科学

2009年第6期

浏览历史

内容加载中请稍等...

基于张量分解的数据流异常检测

参考文献5

二级参考文献14

共引文献23

相关作者

相关机构

相关主题

浏览历史