摘要
在基于Web文本信息过滤系统中通过特征选择找到的最优特征子集直接影响到分类的速度及精度。针对此问题,提出了综合CHI及遗传算法的特征选择方法。首先针对原始特征集,采用CHI统计法进行初始筛选,去除冗余特征及噪声后,对得到的特征子集再采用遗传算法进行第二次特征选择,从而得出代表问题空间的最优特征子集,实现降维并提高了分类精度。
In the Information Filter System based on web text , the optimal subset of features that can be found by feature selection directly affect the speed and accuracy of classification. To solve this problem, this paper proposed a genetic algorithm and CHI integrated methods of feature selection. First view of the original feature set, CHI statistical method was used for the initial screening, eliminating redundant features and noise, and genetic algorithm was used for the second feature selection so as to arrive to the optimal representation feature set and result in the low - dimensional data and the good classification accuracy.
出处
《信息技术与信息化》
2007年第1期43-44,共2页
Information Technology and Informatization
关键词
特征选择
CHI
遗传算法
Feature selection CHI Genetic Algorithm (GA)