摘要
提出类别属性数据流数据离群度量——加权频繁模式离群因子(weighted frequent pattern outlier factor,简称WFPOF),并在此基础上给出一种快速数据流离群点检测算法FODFP-Stream(fast outlier detection for high dimensional categorical data streams based on frequent pattern).该算法通过动态发现和维护频繁模式来计算离群度,能够有效地处理高维类别属性数据流,并可进一步扩展到数值属性和混合属性数据流.对仿真数据集和真实数据集的实验检测均验证该算法具有良好的适用性和有效性.
This paper considers the problem of outlier detection in data stream, proposes a new metric called weighted frequent pattern outlier factor for categorical data streams, and presents a novel fast outlier detection algorithm named FODFP-Stream (fast outlier detection for high dimensional categorical data streams based on frequent pattern). FODFP-Stream computes the outlier measure through discovering and maintaining the frequent patterns dynamically, and can deal with the high dimensional categorical data streams effectively. FODFP-Stream can also be extended to resolve continuous attributes and mixed attributes data streams. The experimental results on synthetic and real data sets show the promising availabilities of the approaches.
出处
《软件学报》
EI
CSCD
北大核心
2007年第4期933-942,共10页
Journal of Software
基金
SupportedbytheNationalNaturalScienceFoundationofChinaunderGrantNo.70371015(国家自然科学基金)
theDoctorScienceResearchFoundationoftheEducationMinistryofChinaunderGrantNo.20040286009(国家教育部高等学校博士学科点科研基金)
关键词
数据流
离群点检测
频繁模式
高维
概念转移
data stream
outlier detection
frequent pattern
high dimension
concept drift