摘要
为了改善传统K-Modes聚类算法相异度度量公式弱化了类内相似性,忽略了属性间差异,以及单一属性值的Modes忽视了某一属性可能存在多属性值组合,且算法受初始中心点影响很大的缺点,基于多属性值Modes的相异度度量方法提出MAV-K-Modes算法,并采用一种基于预聚类的初始中心选取方法。使用UCI数据集进行实验,结果表明,MAV-K-Modes算法相比于传统K-Modes算法,其正确率、类精度和召回率都有明显提升,且MAV-K-Modes算法适合于并行化改造。
The dissimilarity measure method of traditional K-Modes clustering algorithm suffers from some shortcomings,such as weakening the similarity within a class,ignoring the difference between attributes,and the Modes with single attribute value neglects that a property may have multiple attribute value combinations,and the algorithm is greatly affected by the initial center points. A MAV-K-Modes algorithm is proposed based on the dissimilarity measure method of multi-attribute value Modes,and an initial center selection method based on pre-clustering is adopted. The results of experiments using UCI datasets show that the MAV-K-Modes algo-rithm has a significant improvement in accuracy rate,precision rate and recall rate compared with the traditional K-Modes algorithms, and the MAV-K-Modes algorithm is suitable for parallel transformation.
作者
贾彬
梁毅
苏航
JIA Bin;LIANG Yi;SU Hang(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China)
出处
《软件导刊》
2019年第6期60-64,69,共6页
Software Guide
基金
国家自然科学基金青年项目(61202074)