期刊文献+

大规模分布数据的分阶段非线性聚类方法应用研究

Research on Large Scale Distribution Data Method of Nonlinear Clustering
在线阅读 下载PDF
导出
摘要 提出一种能够有效处理大规模分布的数据聚类问题且简化计算复杂度的分阶段非线性聚类方法,该算法包含两个阶段:首先将数据划分为若干个球形分布的子类,采用K近邻图理论对原始数据计算顶点能量并提取顶点攻能量样本;再采用K近邻算法对该高能量样本做一个划分,从而得到一个考虑高能量样本的粗划分同时估计出聚类的个数,最后,综合两次聚类结果整理得到最终聚类结果。该方法的主要优点是可以用来处理复杂聚类问题,算法较为稳定,并且在保持聚类正确率的同时,降低了大规模分布数据为相似性度量的计算代价。 This paper propose a way to efficiently handle large-scale distributed data clustering problems and simplifies the com-putational complexity of nonlinear phased clustering method, this algorithm consists of two phases:First, the data is divided into several sub-categories of spherical distribution, using K neighbor graph theory to calculate the energy of the original data and ex-tract the vertex vertices attack energy sample;then using K-nearest neighbor algorithm to do a sample of the high-energy divi-sion, resulting in a high-energy samples considered coarse division while the estimated number of clusters, and finally comprehen-sive results of the two clustering clustering results to get the final finishing. The main advantage of this method can be used to deal with complex clustering algorithm is more stable, and while maintaining the accuracy of clustering to reduce the computa-tional cost of large-scale distribution of the similarity measure data.
作者 丘威 QIU Wei (School of Computer Science, Jiaying University, Meizhou 514015, China)
出处 《电脑知识与技术》 2013年第12期7767-7769,共3页 Computer Knowledge and Technology
基金 广东省自然科学基金项目(No.S2013010013307)的资助
关键词 流数据 数据挖掘 聚类 非线性 manifold data data mining clustering nonlinear
  • 相关文献

参考文献10

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:163
  • 2周晓云,孙志挥,张柏礼,杨宜东.高维数据流聚类及其演化分析研究[J].计算机研究与发展,2006,43(11):2005-2011. 被引量:9
  • 3张晨,金澈清,周傲英.一种不确定数据流聚类算法[J].软件学报,2010,21(9):2173-2182. 被引量:34
  • 4Aggarwal C C, Han J, Wang J.A framework for projected clustering of high dimensional data streams [M]. Proceedings of the 30th In- ternational Conference on Very Large Data BaseS, Morgan Kaufmann,Toronto, Canada, 2004.
  • 5Gao J, Li J, Zhang Z.An incremental data stream clustering algorithm based on dense units detection[M]. Proceedings of the Ninth Pa- cific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science,Springer,2005.
  • 6Cao F,Ester M,Qian W.Density-based clustering over an evolving data stream with noise[M].J. Ghosh, D. Lambert, D.B. Skillicorn, J. Srivastava (Eds.), Proceedings of the Sixth SIAM International Conference on Data Mining, SIAM, Bethesda, Maryland, USA, 2006.
  • 7朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量:52
  • 8刘青宝,王文熙,马德良.基于相对密度的数据流模糊聚类算法[J].计算机科学,2010,37(8):194-197. 被引量:2
  • 9Bishop C M.Pattern Recognition and Machine Learning[M].Springer,2006.
  • 10Su MC,Ctlou CH.A modified version of the k-means algorithm with a distance based on cluster symmetry[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence,2001,23(6):674--680.

二级参考文献98

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:163
  • 2朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量:52
  • 3刘青宝,金燕,邓苏,张维明.基于模糊聚类的属性匹配算法[J].模糊系统与数学,2006,20(6):96-102. 被引量:12
  • 4刘青宝,戴超凡,邓苏,张维明.基于网格的数据流聚类算法[J].计算机科学,2007,34(3):159-161. 被引量:10
  • 5刘青宝,何勇,邓苏,张维明.基于相对密度的多分辨率聚类算法[J].小型微型计算机系统,2007,28(7):1287-1292. 被引量:4
  • 6Aggarwal C,Han J,Wang J,et al.A framework for clustering evolving data streams[C] ∥Proc.of VLDB.2003:81-92.
  • 7Aggarwal C,Han J,Wang J,et al.A framework for projected clustering of high dimensional data streams[C] ∥Proc.2004 Int.Conf.Very Large Data Bases (VLDB'04).Toronto,Canada,2004,8:852-863.
  • 8Cao F,Martin E,Qian W,et al.Density-based Clustering over an Evolving Data Stream with Noise[C] ∥Proc.of the 2006 SIAM Conference on Data Mining (SDM'2006).2006.
  • 9Liu Qing-Bao,Deng Su,Lu Chang-Hui,et al.Relative density based K-nearest neighbors clustering algorithm[C] ∥Proc.2003 Int.Conf.on Machine Learning and Cybernetics.2003:133-137.
  • 10Babcock B,Babu S,Datar M,Motwani R,Widom J.Models and issues data stream systems.In:Proc.of the 21st ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.Madison:ACM,2002.1-16.

共引文献240

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部