摘要
ROCK是一种采用数据点间的公共链接数来衡量相似度的分层聚类方法,该方法对于高维、稀疏特征的分类数据具有高效的聚类效果。其邻接度矩阵计算是影响时间复杂度的关键步骤,将图形处理器(GPU)强大的浮点运算和超强的并行计算能力应用于此步骤,而其余步骤由CPU完成。基于GPU的ROCK算法的运算效率在AMD 643500+CPU和NVIDIAGeForce 6800 GT显卡的硬件环境下经过实验测试,证明其运算速度比完全采用CPU计算速度要快。改进的分层聚类算法适合在数据流环境下对大量数据进行实时高效的聚类的操作。
This paper proposed a novel algorithm named robust clustering algorithm for categorical (ROCK) model to improve clustering quality and it was efficient for the data of high dimensionality, sparsity and categorical nature. A novel concept called common neighbors( links), an appropriate selection of nearest neighbors, was adopted as similarity measure between a pair of points. The key step of computing adjacency matrix, which had a significant effect on the time complexity, could be implemented by GPU' s excellent performance such as the number of floating-point operations per second and the parallel processing on fragment vector processing, and the others could be finished by central processing units (CPU). Some experiments conducted in a PC with AMD 643500 + CPU and NVIDIA GeForce 6800 GT graphic card demonstrate that the presented algorithm is faster than the previous CPU-based algorithms, thus it is applicable for the clustering data stream that requiring for high speed processing and high quality clustering results.
出处
《计算机应用研究》
CSCD
北大核心
2008年第8期2319-2321,2327,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(60603053,60274026,60373089,60403002)
衡阳师范学院教学研究资助项目(A267)
衡阳师范学院青年科研基金资助项目(07A29)
关键词
聚类分析
图形处理器
通用计算
分层聚类
clustering analysis
graphics processing units(GPU)
general purpose computation
hierarchical clustering