摘要
该文研究了已有和最新的各种基于评估函数的特征筛选方法 ,评价了它们的优缺点和适用范围 ,并实现了一种用评估函数代替 TFIDF法中 IDF函数进行分类的新算法。然后进一步从如何放宽特征独立性假设 。
This paper analyzes most known feature selection methods based on scoring functions to analyze their advantages and disadvantages. A new algorithm is presented which uses scoring functions to adjust the weight of words instead of IDF functions as in TFIDF methods. The paper then considers how to relax the feature independent assumption in the feature selection and how to improve the result by reducing the number of features using hierarchical classifying algorithms.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2001年第7期98-101,共4页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金资助项目 (79990 5 80 )
国家"九七三"基础研究基金项目 (G19980 3 0 414 )
清华大学基础研究基金项目
关键词
文本挖掘
词袋法
评估函数
特征独立性假设
等级树
特征抽取
文本分类
text mining
word of bag
feature independent assumption
evaluation function
classical tree
feature extraction