摘要
为了有效提高文本聚类的质量,用聚类过程不断反馈的信息熵改进向量空间模型中特征词权重的计算,构造以文本相似性为基础的抗体-抗原亲和力和抗体浓度计算方法,提出用亲和力和抗体浓度控制的抗体克隆和变异策略寻找聚类中心,并将文本归入与聚类中心相似度最大的类簇.实验表明,该算法可得到聚类质量较高并且稳定性较好的聚类结果.
To improve the quality of text clustering effectively,a new weight of feature word in the vector space model based on information entropy,affinity and concentration of antibody based on the similarity between documents were proposed.The new algorithm searched the cluster centers with antibody clone and mutation strategies.Clone numbers and variation range of mutation were controlled by affinity and concentration of the antibody,and then the text was classified by the similarity between text and cluster centers.The results of experiments showed that the algorithm got the results with high and steady clustering quality.
出处
《郑州大学学报(理学版)》
CAS
北大核心
2011年第1期46-49,共4页
Journal of Zhengzhou University:Natural Science Edition
基金
重庆市教委科学技术研究项目
编号KJ091309
关键词
文本聚类
免疫克隆
信息熵
亲和力
text clustering
immune cloning
information entropy
affinity