大规模分布数据的分阶段非线性聚类方法应用研究

Research on Large Scale Distribution Data Method of Nonlinear Clustering

下载PDF

导出

摘要提出一种能够有效处理大规模分布的数据聚类问题且简化计算复杂度的分阶段非线性聚类方法，该算法包含两个阶段：首先将数据划分为若干个球形分布的子类，采用K近邻图理论对原始数据计算顶点能量并提取顶点攻能量样本；再采用K近邻算法对该高能量样本做一个划分，从而得到一个考虑高能量样本的粗划分同时估计出聚类的个数，最后，综合两次聚类结果整理得到最终聚类结果。该方法的主要优点是可以用来处理复杂聚类问题，算法较为稳定，并且在保持聚类正确率的同时，降低了大规模分布数据为相似性度量的计算代价。 This paper propose a way to efficiently handle large-scale distributed data clustering problems and simplifies the com-putational complexity of nonlinear phased clustering method, this algorithm consists of two phases：First, the data is divided into several sub-categories of spherical distribution, using K neighbor graph theory to calculate the energy of the original data and ex-tract the vertex vertices attack energy sample;then using K-nearest neighbor algorithm to do a sample of the high-energy divi-sion, resulting in a high-energy samples considered coarse division while the estimated number of clusters, and finally comprehen-sive results of the two clustering clustering results to get the final finishing. The main advantage of this method can be used to deal with complex clustering algorithm is more stable, and while maintaining the accuracy of clustering to reduce the computa-tional cost of large-scale distribution of the similarity measure data.

作者丘威 QIU Wei （School of Computer Science, Jiaying University, Meizhou 514015, China）

机构地区嘉应学院计算机学院

出处《电脑知识与技术》 2013年第12期7767-7769,共3页 Computer Knowledge and Technology

基金广东省自然科学基金项目（No.S2013010013307）的资助

关键词流数据数据挖掘聚类非线性 manifold data data mining clustering nonlinear

分类号 TP315 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量：163
2周晓云,孙志挥,张柏礼,杨宜东.高维数据流聚类及其演化分析研究[J].计算机研究与发展,2006,43(11):2005-2011. 被引量：9
3张晨,金澈清,周傲英.一种不确定数据流聚类算法[J].软件学报,2010,21(9):2173-2182. 被引量：33
4Aggarwal C C, Han J, Wang J.A framework for projected clustering of high dimensional data streams [M]. Proceedings of the 30th In- ternational Conference on Very Large Data BaseS, Morgan Kaufmann,Toronto, Canada, 2004.
5Gao J, Li J, Zhang Z.An incremental data stream clustering algorithm based on dense units detection[M]. Proceedings of the Ninth Pa- cific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science,Springer,2005.
6Cao F,Ester M,Qian W.Density-based clustering over an evolving data stream with noise[M].J. Ghosh, D. Lambert, D.B. Skillicorn, J. Srivastava (Eds.), Proceedings of the Sixth SIAM International Conference on Data Mining, SIAM, Bethesda, Maryland, USA, 2006.
7朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量：52
8刘青宝,王文熙,马德良.基于相对密度的数据流模糊聚类算法[J].计算机科学,2010,37(8):194-197. 被引量：2
9Bishop C M.Pattern Recognition and Machine Learning[M].Springer,2006.
10Su MC,Ctlou CH.A modified version of the k-means algorithm with a distance based on cluster symmetry[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence,2001,23(6):674--680.

二级参考文献98

1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量：163
2朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量：52
3刘青宝,金燕,邓苏,张维明.基于模糊聚类的属性匹配算法[J].模糊系统与数学,2006,20(6):96-102. 被引量：12
4刘青宝,戴超凡,邓苏,张维明.基于网格的数据流聚类算法[J].计算机科学,2007,34(3):159-161. 被引量：10
5刘青宝,何勇,邓苏,张维明.基于相对密度的多分辨率聚类算法[J].小型微型计算机系统,2007,28(7):1287-1292. 被引量：4
6Aggarwal C,Han J,Wang J,et al.A framework for clustering evolving data streams[C] ∥Proc.of VLDB.2003:81-92.
7Aggarwal C,Han J,Wang J,et al.A framework for projected clustering of high dimensional data streams[C] ∥Proc.2004 Int.Conf.Very Large Data Bases (VLDB'04).Toronto,Canada,2004,8:852-863.
8Cao F,Martin E,Qian W,et al.Density-based Clustering over an Evolving Data Stream with Noise[C] ∥Proc.of the 2006 SIAM Conference on Data Mining (SDM'2006).2006.
9Liu Qing-Bao,Deng Su,Lu Chang-Hui,et al.Relative density based K-nearest neighbors clustering algorithm[C] ∥Proc.2003 Int.Conf.on Machine Learning and Cybernetics.2003:133-137.
10Babcock B,Babu S,Datar M,Motwani R,Widom J.Models and issues data stream systems.In:Proc.of the 21st ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.Madison:ACM,2002.1-16.

共引文献239

1田李,王乐,贾焰,邹鹏,李爱平.分布式数据流上低通信开销的连续极值查询方法研究[J].计算机研究与发展,2007,44(z3):61-66.
2陈飞波,钱卫宁,周傲英.基于最窄平行四边形的数据流突变检测算法[J].计算机研究与发展,2007,44(z3):505-510.
3何月梅,杜海艳,王保民.分形技术与矢量量化相结合的网络流量异常检测研究[J].邯郸学院学报,2009,19(3):73-76.
4秦林新,刘奇志.一种乱序数据流上的偏倚抽样算法[J].计算机研究与发展,2011,48(S3):298-303.
5陈爱东,刘国华,费凡,周宇,万小妹,貟慧.满足均匀分布的不确定数据关联规则挖掘算法[J].计算机研究与发展,2013,50(S1):186-195. 被引量：18
6张明明,芦琳.电能计量中的异常数据研究[J].电气应用,2013,0(S1):42-46. 被引量：3
7金澈清,崇志宏,周傲英.一种实时监控最近邻的近似算法[J].计算机科学与探索,2007,1(2):146-159.
8杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测[J].计算机研究与发展,2005,42(9):1498-1504. 被引量：9
9杜威,邹先霞.基于数据流的滑动窗口机制的研究[J].计算机工程与设计,2005,26(11):2922-2924. 被引量：11
10刘赏,黄亚楼,倪维健.流数据聚类模型变化检测策略[J].计算机工程与应用,2006,42(5):15-18.

1古凌岚,彭利民.基于相对密度和流形上k近邻的聚类算法[J].计算机科学,2016,43(12):213-217. 被引量：2
2张淑芬,董岩岩.基于Hadoop平台的气象数据聚类研究与实现[J].信息系统工程,2016,29(10):123-123.
3蒋加伏,罗晓萍,唐贤瑛.基于混合聚类算法的图像分割[J].计算技术与自动化,2004,23(1):71-73. 被引量：2
4赵梦玲,刘红卫,刘若辰.基于遗传模拟退火算法的矢量量化码书设计[J].数学的实践与认识,2015,45(1):209-218. 被引量：2
5张昕尧,高宏.一种新的属性图重叠聚类挖掘算法[J].智能计算机与应用,2012,2(5):27-30. 被引量：1
6公茂果,王爽,马萌,曹宇,焦李成,马文萍.复杂分布数据的二阶段聚类算法[J].软件学报,2011,22(11):2760-2772. 被引量：32
7IP地址扫描器 1.0[J].黑客防线,2005(B07):39-39.
8孔波.虚拟数据库技术及应用[J].计算机,2001(29):19-19.
9胡伟.一种改进的动态k-均值聚类算法[J].计算机系统应用,2013,22(5):116-121. 被引量：8
10赵江.有问题，找网络在线资源大放送[J].电脑时空,2005(12):146-147.

电脑知识与技术

2013年第12期

浏览历史

内容加载中请稍等...

大规模分布数据的分阶段非线性聚类方法应用研究

参考文献10

二级参考文献98

共引文献239

相关作者

相关机构

相关主题

浏览历史