期刊文献+

非平衡数据集中的特征选择方法和三支分类算法研究 被引量:2

Research on improved feature selection method and NB three-way classification algorithm in unbalanced dataset
在线阅读 下载PDF
导出
摘要 针对传统的特征选择方法在非平衡数据集中分类效果不理想的问题,提出了一种适合非平衡数据分类的改进特征选择方法.该方法将集中度和分散度相结合,同时考虑到在文本长短不一时词频对文本分类的作用,得到一种新的词频归一化方法,实现了对传统特征提取方法的改进.另一方面,将三支决策思想引入到朴素贝叶斯算法,得到了NB-三支决策分类算法,并将该算法应用到非平衡数据集的分类.通过两组实验对比结果表明:改进特征选择方法较CHI和IG方法,处理非平衡度高的数据集分类效果较好;选取相同的特征选择方法和数据集,NB-三支分类器比NB-分类器的分类效果好.选用本文提出的改进特征选择方法和NB-三支分类器,在处理非平衡度高且文本长短不一的数据集时,分类效果有一定提升. Aiming at problems of traditional classification algorithm is not effective in dealing with unbalanced data,an improved feature selection method was proposed.It was combined the concentration information and distributioninformation.It also took into account the influence of word frequency on text classification under the condition ofdifferent texts lengths.A new word frequency normalization method was obtained,and the improvement of featureextraction method was realized.A new classification algorithm of NB three-way decision was put forward,which wascombined with the theory of three-way decision and Naive Bayesian algorithm.The new NB three-way classifier wasused in imbalanced text categorization.The results of two groups of comparative experiments showed that theimproved method could preferably classification effect than CHI and IG.The classification effect of NB-three wayclassifier was better than that of NB-classifier,which was choosing the same feature selection method and imbalanceddatasets.Finally,the feature selection method and NB-three way classifier proposed in this paper can improve theclassification effect,which are both used in the unbalanced data set.
作者 刘杰 苏慧哲 李艳翠 LIU Jie;SU Huizhe;LI Yancui(Department of Information Engineering,Henan Institute of Science and Technology,Xinxiang 453003,China)
出处 《河南科技学院学报(自然科学版)》 2018年第5期66-72,共7页 Journal of Henan Institute of Science and Technology(Natural Science Edition)
基金 国家自然科学基金(61502149)
关键词 文本分类 非平衡数据集 特征提取 三支决策 朴素贝叶斯算法 text categorization imbalanced datasets feature selection three-way decision Naive Bayes algorithm
  • 相关文献

参考文献6

二级参考文献63

共引文献54

同被引文献4

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部