摘要
提出了多目标监督聚类GA算法,即:根据样本的类标签有监督地将样本聚类,在每个类中根据样本属性的相似性有监督地聚成类簇.如果分属不同类标签的类簇出现相交,则相交类簇再次聚类,直到所有类簇均不相交.适应度矢量函数由类簇数和类内距离2个目标确定,类簇数和类簇中心由目标函数自动确定,从而类簇数和中心就不受主观因素的影响,并且保证了这2个关键要素的优化性质.预测分类时,删去单点类簇,并根据类簇号和离某个类簇中心距离的最近邻法则以及该类簇的类标签进行分类.算法模型采用C#实现,采用3个UCI数据集进行实例分析,实验结果表明,本算法优于著名的Native Bayes、Boost C4.5和KNN算法.
This paper presents a new multi-objective supervised clustering genetic algorithm. Samples are supervisedly clustered into several classes by class labels. In each class, samples are supervisedly clustered into class clusters according to the similarity of the sample properties. If the class clusters which belong to different class labels intersect, these intersecting class clusters are clustered again into class clusters until all the class clusters don' t intersect. The fitness vector function is determined by the number of class clus- ters and within-class distance. The number and center of class clusters can be determined automatically by using the fitness vector function. The two key elements can be unaffected by subjective factors and have op- timization natures. During classification forcast, the single-point class cluster is deleted and then classifica- tion is done according to the class cluster number, the nearest neighbor rule and the class labels. The algo- rithm model is implemented with C #, using three UCI data sets as the experiment data. The experimental results indicate that this algorithm is better than Native Bayes, Boost C4.5 and KNN algorithms.
出处
《成都大学学报(自然科学版)》
2013年第1期58-60,63,共4页
Journal of Chengdu University(Natural Science Edition)
关键词
多目标GA
监督聚类
类标签
最近邻法则
multi-objective GA
supervised clustering
class label
nearest neighbor rule