摘要
使用基尼指数原理进行了文本特征选择的研究,构造了基于基尼指数的适合于文本特征选择的评估函数。结合fkNN和SVM两种不同的分类方法,在两个不同的语料集上,与其他著名的文本特征选择方法进行比较和分析实验,结果显示它的性能与现有的特征选择方法不相上下,但在算法时间复杂上获得了良好的性能。
This paper used improved Gini-index for text feature selection, and constructed the measure function based on Gini-Index, then compared it to other four feature selection measures using two kinds of classifiers on two different document corpora. The result of experiments shows that its performance is comparable with other text feature selection approaches. However, it is perfect in the time complexity of algorithm.
出处
《计算机应用》
CSCD
北大核心
2007年第10期2584-2586,2590,共4页
journal of Computer Applications
关键词
文本分类
特征选择
基尼指数
特征评估函数
text categorization
feature selection
Gini-Index
feature selection function