摘要
基于Hilbert-Schmidt依赖性准则提出了一种新颖的特征选择算法FSUNT,重点考虑特征选择过程中可能出现的模糊性和不确定性。针对类标号不确定而其他特征值确定的文本数据,通过考察特征与不确定的类标号间的Hilbert-Schmidt相关性,对特征进行排序,并选取最终的结果子集。最后大量真实与仿真实验结果表明,基于该算法可得到良好的分类效果和稳定性。
A novel algorithm called FSUNT was proposed based on HSIC, with the focus on the vagueness and uncertainty which might be taken into account during feature selection. For text data with fixed feature values and uncertain class labels, features were ranked according to the correlation between features and uncertain class labels evaluated by HSIC. The results of experimental evaluation on a variety of datasets show better performance and stability of FSUNT.
出处
《通信学报》
EI
CSCD
北大核心
2009年第8期32-38,44,共8页
Journal on Communications
基金
国家高技术研究发展计划("863"计划)基金资助项目(2006AA01Z451
2007AA01Z474
2007AA010502)
国家自然科学基金资助项目(60873204)~~
关键词
特征选择
不确定数据
文本分类
feature selection
uncertain text
text classification