摘要
在处理海量数据集时,由于单台计算机的处理能力有限,利用传统的聚类算法难以在有效的时间内获得聚类结果。在基于密度和自适应密度可达聚类算法的基础上,提出一种并行聚类算法。理论和实验结果证明该算法具有接近线性的加速比,能够有效地处理大规模的数据集。
During dealing with massive data sets,a single computer's power is limited.The traditional clustering algorithms are difficult to obtain the results in the short time.To overcome these problems,a new parallel clustering algorithm is presented according to the analysis of clustering algorithm based on density and adaptive density-reachable.Theoretical analysis and experimental results demonstrate that the algorithm is near-linear speed-up ratio,and can handle the massive data sets effectively.
出处
《计算机与现代化》
2010年第8期5-7,14,共4页
Computer and Modernization
基金
国家自然科学基金资助项目(40762003)
内蒙古自然科学基金资助项目(200711020814)
关键词
并行聚类
海量数据
集群
parallel clustering
massive data sets
cluster computer