期刊文献+

滑动窗口模型下的概率数据流聚类 被引量:2

Clustering for probabilistic data stream over sliding windows
在线阅读 下载PDF
导出
摘要 提出一种基于滑动窗口的概率数据流聚类方法PWStream。PWStream采用聚类特征指数直方图保存最近数据元组的信息摘要,在允许的误差范围内删除过期的数据元组;并针对数据流上概率元组提出强簇、过渡簇和弱簇的概念,设计了一种基于距离和存在概率的簇选择策略,从而可以发现更多的强簇。理论分析和实验结果表明,该方法具有良好的聚类质量和较快的数据处理能力。 An effective clustering algorithm called PWStream for probabilistic data stream over sliding window is developed.The algorithm uses exponential histogram of cluster feature to store the summary information of the most recently arrived tuples,and outdated information is deleted within a certain guaranteed range of error.For the uncertain tuples in data stream,the concepts of strong cluster,transitional cluster and weak cluster are proposed in the PWStream.With these concepts,an effective strategy of choosing cluster based on distance and existence probability is designed,which can find more strong clusters.Theoretical analysis and comprehensive experimental results demonstrate that the proposed method is of high quality and fast processing rate.
出处 《计算机工程与应用》 CSCD 北大核心 2011年第4期141-145,共5页 Computer Engineering and Applications
基金 安徽省自然科学基金(No.090416247,No.070412055) 安徽省高校自然科学研究计划项目(No.KJ2009B139) 安徽省高等学校青年教师科研资助计划项目(No.2008jq1143)~~
关键词 概率数据流 聚类 滑动窗口 直方图 probabilistic data stream clustering sliding window histogram
  • 相关文献

参考文献11

  • 1Aggarwal C C, Han Jiawei, Wang Jianyong, et al.A framework for clustering evolving data streams[C]//Proceeding of the 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003 : 81-92.
  • 2Aggarwal C C, Han Jiawei, Wang Jianyong, et al.A framework for projected clustering of high dimensional data streams[C]// Proceedings of the 30th International Conference on Very Large Data Bases,Toronto,Canada,2004: 852-863.
  • 3Cao Feng,Ester M,Qian Weining,et al.Density-based clustering over an evolving data stream with noise[C]//Proceedings of the 6th SIAM International Conference on Data Mining, Bethesda, MD,USA, 2006: 326-337.
  • 4罗义钦,倪志伟,杨葛钟啸.一种新的数据流分形聚类算法[J].计算机工程与应用,2010,46(6):136-138. 被引量:3
  • 5郑盈盈,倪志伟,吴姗,王丽红.基于移动网格和密度的数据流聚类算法[J].计算机工程与应用,2009,45(8):129-131. 被引量:5
  • 6万仁霞,王立新,刘振文.基于相异度矩阵的混合属性数据流聚类算法[J].计算机工程与应用,2008,44(25):149-151. 被引量:8
  • 7Cormode G, Garofalakis M.Sketching probabilistic data streams[C]// Chan C Y, Ooi B C, Zhou A.Proc of the ACM SIGMOD Int' 1 Conf on Management of Data.Beijing: ACM Press, 2007 : 281-292.
  • 8Jayram T S, McGregor A, Muthukrishan S, Vee E.Estimating statistical aggregates on probabilistic data streams[C]//Libkin L. Proc of the 26th ACM SIGMOD-SIGACT-SIGART Syrup Principles of Database Systems.Beijing:ACM Press,2007:243-252.
  • 9Jayram T S, Kale S, Vee E.Efficient aggregation algorithms for probabilistic data[C]//Bansal N, Pruhs K, Stein C.Proc of the 18th Annual ACM-SIAM Symp on Discrete Algorithms(SODA). New Orleans : SIAM, 2007 : 346-355.
  • 10戴东波,赵杠,孙圣力.基于概率数据流的有效聚类算法[J].软件学报,2009,20(5):1313-1328. 被引量:15

二级参考文献74

  • 1赵明清,蒋昌俊,陶树平.基于等价相异度矩阵的聚类[J].计算机科学,2004,31(7):183-184. 被引量:11
  • 2朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量:52
  • 3刘青宝,戴超凡,邓苏,张维明.基于网格的数据流聚类算法[J].计算机科学,2007,34(3):159-161. 被引量:10
  • 4颜晓龙,沈鸿.一种适用于高维数据流的子空间聚类方法[J].计算机应用,2007,27(7):1680-1684. 被引量:6
  • 5Guha S,Mishra N,Montwani R,et al.Clustering data streams[C]// Proc of IEEE Symposium on Foundations of Computer Science (FOCS'00), 2000: 71-80.
  • 6Guha S,Meyerson A,Mishra N,et al.Clustering data streams:Theory and practice[J].IEEE Transactions on Knowledge and Data Engineering, 2003,15 (3) : 515-528.
  • 7Aggarwal C ,Han J,Wang J,et al.A framework for clustering evolving data stream[C]//Proc of Int Conf on Very Large Data Bases (VLDB' 03 ), 2003 : 81-92.
  • 8aHan J,Kamber M.Data Mining:concepts and techniques[M].2nd ed. [S.l.] : Morgan Kaufman, 2006: 386-397.
  • 9Aggarwal C,Han J,Wang J,et al.A framework for projected clustering of high dimensional data stream[C]//Proc of Int Conf on Very Large Data Bases(VLDB'04),2004: 852-863.
  • 10韩家炜,KamberM.数据挖掘概念与技术[M].2版北京:机械工业出版社,2007.

共引文献83

同被引文献14

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部