摘要
文中详细分析讨论了BIRCH算法中存在的不足,并针对其不足进行一定的改进,提出了一种基于离差平方和的改进多阈值BIRCH算法,充分利用离差平方和来建立簇与簇的相关性,相对于单纯以簇之间的中心距离来建立相关性有一定的改进,同时在分裂因子的确定上采用了簇中直径的最大值,克服因采用经验值确定分裂因子的缺陷.最后,引入到基因序列图形表达数据聚类分析应用中.
BIRCH (Balanced Iterative R is a new algorithm for large datasets, but educing and Clustering Using Hierarchies) clustering algorithm this algorithm has some shortcomings. Considering these short- comings, an improved algorithm based on the sum of deviation square is proposed to fully utilize the pertinence between the clusters. The split factor is defined by the maximum diameter to overcome shortcoming of the factor from the experience. At last, the improved BIRCH clustering algorithm is tested to analyze the gene graphical representation data.
出处
《湛江师范学院学报》
2009年第3期83-87,共5页
Journal of Zhanjiang Normal College
基金
湖南省自然科学重点基金资助项目(06JJ4076)
关键词
BIRCH算法
聚类特征
基因图形表达数据
BIRCH algorithm
clustering feature
data of gene graphical representation