摘要
K-均值聚类分析算法是一种广泛应用于基因表达数据聚类分析中的迭代变换算法,它通过指定类别数K,基于给定的聚类目标函数,并采用迭代更新的方法,使得最终的聚类结果的目标函数值为极小值,达到较优的聚类效果。针对K-均值聚类分析算法存在参数依赖性强,且在整个聚类过程中类的数目无法改变的缺点,引入动态调整聚类个数的思想和多维伪F统计量,提出了一种基于多维伪F统计量的基因表达动态K-均值聚类算法。实验结果表明该算法可以动态调整聚类个数,给出最佳聚类数目,从而获得较好的聚类质量。
K-means clustering analysis algorithm is a widely iterated algorithm in clustering analysis ofgene expression data. In this algorithm, cluster number is assumed to be K and iterated methods are employed to make the value of objective function minimum. By doing so, the cluster result improves very much. However K-means clustering analysis algorithm depends on parameters strongly and the cluster number keeps unchanged.. Fake F-statistic and an idea of adjusting cluster number were dynamically introduced, and then a new dynamic K-means clustering algorithm for Genes expressed data was proposed based on multi-dimension fake F-statistic. The experiment results show that the algorithm can adjust cluster number and gain a prime number of clustering, which thus argues that this algorithm can attain better clustering quality.
出处
《系统仿真学报》
EI
CAS
CSCD
北大核心
2006年第3期586-589,601,共5页
Journal of System Simulation
基金
湖南省自然科学基金(03JJY3095)
关键词
聚类分析
基因表达数据
伪F统计量
动态K-均值聚类
clustering analysis
genes expressed data
fake F-statistics
dynamic K-means clustering