摘要
介绍了文本聚类中基于划分的方法,针对该算法对孤立点的过于敏感问题,提出了一种用于特征选择的算法改进模型,通过对特征集里孤立点的剔除改善了特征聚类效果.随后的文本分类试验表明,提出的改进的算法具有较好的特征选择效果,文本分类的效率较高.
This paper first introduces the partitioning-based k-means algorithm for documents clustering. In allusion to the problem that it is sensitive to outliers, we put forward an improved k-means model for the feature selection. By deleting the outliers in the feature set, we have improved the effect of feature clustering. Lastly, we have a test about text categorization and the result shows that this method has a better feature clustering effect and more, the efficiency of text classification is better.
出处
《微电子学与计算机》
CSCD
北大核心
2009年第6期29-31,35,共4页
Microelectronics & Computer
基金
国家自然科学基金项目(70571087)
关键词
特征选择
特征降维
特征聚类
文本分类
feature selection
feature reduction
feature clustering
text classification