摘要
K-means聚类算法在数据挖掘聚类分析方法中是一个基本的、使用最广泛的划分算法。为了对数字图书馆中大量的数字资源进行更加有效、快速的聚类,文中针对传统的K-means算法存在的问题,结合数字图书馆数字资源的特征,提出了一种改进的基于关键词特征向量的初始聚类中心选择算法,并在此基础上对传统的K-means聚类算法进行了改进,用于对数字资源进行聚类,并进行了算法的实验验证。通过对实验结果的分析证明,文中提出的算法降低了数字资源聚类的代价,提高了聚类的效率,从而验证了算法的可行性。
K-means clustering algorithm is a basic analysis method in data mining closeting analysis,which is also the most widely used partitioning algorithm.In this paper,in order to get more fast and effective clustering result from large number of digital resources in digital library,aiming at the problems of the traditional K-means algorithm,combining with the features of the digital resources,an improved selection algorithm based on the keyword feature vector for initial clustering center is proposed.On this basis,the traditional K-means clustering algorithm is improved for digital resources clustering and experiment verification.The analysis results show that the algorithm proposed reduces the digital resources clustering cost,improves the clustering efficiency,verifying the feasibility of the algorithm.
出处
《计算机技术与发展》
2014年第6期107-109,113,共4页
Computer Technology and Development
基金
河北省自然科学基金面上项目(F2013203324)
关键词
K-MEANS算法
数字资源
相似度
初始聚类中心
K-means clustering algorithm
digital resource
similarity
initial clustering center