摘要
提出一种能够有效处理大规模分布的数据聚类问题且简化计算复杂度的分阶段非线性聚类方法,该算法包含两个阶段:首先将数据划分为若干个球形分布的子类,采用K近邻图理论对原始数据计算顶点能量并提取顶点攻能量样本;再采用K近邻算法对该高能量样本做一个划分,从而得到一个考虑高能量样本的粗划分同时估计出聚类的个数,最后,综合两次聚类结果整理得到最终聚类结果。该方法的主要优点是可以用来处理复杂聚类问题,算法较为稳定,并且在保持聚类正确率的同时,降低了大规模分布数据为相似性度量的计算代价。
This paper propose a way to efficiently handle large-scale distributed data clustering problems and simplifies the com-putational complexity of nonlinear phased clustering method, this algorithm consists of two phases:First, the data is divided into several sub-categories of spherical distribution, using K neighbor graph theory to calculate the energy of the original data and ex-tract the vertex vertices attack energy sample;then using K-nearest neighbor algorithm to do a sample of the high-energy divi-sion, resulting in a high-energy samples considered coarse division while the estimated number of clusters, and finally comprehen-sive results of the two clustering clustering results to get the final finishing. The main advantage of this method can be used to deal with complex clustering algorithm is more stable, and while maintaining the accuracy of clustering to reduce the computa-tional cost of large-scale distribution of the similarity measure data.
作者
丘威
QIU Wei (School of Computer Science, Jiaying University, Meizhou 514015, China)
出处
《电脑知识与技术》
2013年第12期7767-7769,共3页
Computer Knowledge and Technology
基金
广东省自然科学基金项目(No.S2013010013307)的资助
关键词
流数据
数据挖掘
聚类
非线性
manifold data
data mining
clustering
nonlinear