摘要
相比于k-means算法,模糊C均值(FCM)通过引入模糊隶属度,考虑不同数据簇之间的相互作用,进而避免了聚类中心趋同性问题.然而模糊隶属度具有拖尾和翘尾的结构特征,因此使得FCM算法对噪声点和孤立点很敏感;此外,由于FCM算法倾向于将各数据簇均等分,因此算法对数据簇大小也很敏感,对非平衡数据簇聚类效果不佳.针对这些问题,本文提出了基于可靠性的鲁棒模糊聚类算法(RRFCM).该算法基于当前的聚类结果,对样本点进行可靠性分析,利用样本点的可靠性和局部近邻信息,突出不同数据簇之间的可分性,从而提高了算法对噪声的鲁棒性,并且降低了对非平衡数据簇大小的敏感性,得到了泛化性能更好的聚类结果.与相关算法进行对比,RRFCM算法在人造数据集,UCI真实数据集以及图像分割实验中均取得最优的结果.
Compared with the k-means algorithm,fuzzy C-means(FCM)considers the interaction between different data clusters by introducing fuzzy membership degree,thus avoiding the clustering center overlapping problem.However,fuzzy membership degree has the structural characteristics of trailing and warp-tail,which makes FCM algorithm very sensitive to noise points and outliers.In addition,the FCM algorithm tends to classify the data cluster with average size,so it is sensitive to data cluster size also,which makes the algorithm not good for clustering imbalanced data clusters.To solve these problems,a reliability–based of robust fuzzy clustering algorithm(RRFCM)is proposed in this paper.The algorithm is based on the current clustering results,the reliability analysis was carried out on the sample points,using the reliability of the sample points and local neighbor information,highlight the separability between different data clusters,so as to improve the robustness of the algorithm for noises,and reduce the sensitivity to cluster size and behave better on unbalanced data cluster size,better generalization capability of the clustering results are obtained.Compared with related algorithms,the algorithm achieves the optimal results in artificial data sets,UCI real data sets and image segmentation experiments.
作者
潘金艳
高朋
高云龙
谢有为
熊裕慧
PAN Jin-yan;GAO Peng;GAO Yun-long;XIE You-wei;XIONG Yu-hui(College of Information Engineering,Jimei University,Xiamen Fujian 361021,China;College of Marine Navigation,Jimei University,Xiamen Fujian 361021,China;College of Aeronautics and Astronautics,Xiamen University,Xiamen Fujian 361101,China)
出处
《控制理论与应用》
EI
CAS
CSCD
北大核心
2021年第4期516-528,共13页
Control Theory & Applications
基金
国家自然科学基金项目(61203176)
福建省自然科学基金项目(2013J05098,2016J01756)资助.
关键词
模糊C均值(FCM)
类不均衡
集成学习
k近邻约束
局部信息
fuzzy C-means(FCM)
size imbalance
ensemble learning
k-nearest neighbor constraint
local information