期刊文献+

基于后验概率的不平衡数据集特征选择算法 被引量:5

Posterior-probability-based Feature Selection Algorithm for Imbalanced Datasets
在线阅读 下载PDF
导出
摘要 针对不平衡数据集,提出一种基于后验概率的特征选择算法。该算法引入基于Parzen-window方法估算的不均衡因子,并以Tomeklinks中点为初始值进行迭代,找出满足后验概率相等的判别边界点,通过对这些点法向量进行投影计算得到各特征的权值。实验表明,对于不平衡数据集,该算法在不降低分类器总体性能的基础上,不仅可以有效降低维度,节省计算开销,而且能够避免常规特征选择算法用于不平衡数据时忽视小类的缺点。 In this paper, a posterior-probability-based feature selection algorithm is proposed for imbalanced datasets. In the proposed algorithm, an imbalanced factor is introduced and computed by Parzen-window estimation. The middle point of Tomek links is chosen as the initial point. Accordingly, this algorithm is iterated to find out the boundary points which have the equality of posterior probability. Through the project computation on the normal vectors of these points, the weight of each feature can be obtained, which actually indicates the importance degree of each feature. The experimental results on three real-word datasets demonstrate that this proposed algorithm can not only reduce the computational cost but also overcome the shortcoming that the majority class may be detected well but the minority class may be ignored in the conventional feature selection algorithm.
出处 《计算机工程》 CAS CSCD 北大核心 2008年第19期1-3,共3页 Computer Engineering
基金 国家部委基础研究基金资助项目 教育部重点科学研究基金资助项目(105087) 2004年教育部优秀人才支持计划基金资助项目(NCET-04-0496) 模式识别国家重点实验室开放课题基金资助项目 南京大学软件新技术国家重点实验室开放课题基金资助项目
关键词 不平衡数据集 特征选择 后验概率 imbalanced datasets feature selection posterior probability
  • 相关文献

参考文献6

  • 1Abe N, Kudo M. Non-parametric Classifier-independent Feature Selection[J]. Pattern Recognition, 2006, 39(5): 737-746.
  • 2Barandela R, Sanchez J S, Garcfa V. Strategies for Learning in Class Imbalance Problems[J]. Pattern Recognition, 2003, 36(3): 849-851.
  • 3Tomek I. Two Modifications of CNN[J]. IEEE Transactions on Systems, Man and Communications, 1976, 6(11): 769-772.
  • 4Blake C L, Merz C J. UCI Repository of Machine Learning Databases[EB/OL]. (1998-08-08). http://www.ics.uci.edu/-mlearn/ MLRepository.html.
  • 5Cover T M, Hart P E. Nearest Neighbor Pattern Classification[J]. IEEE Transactions on Information Theory, 1967, 13(1): 21-27.
  • 6Wang Shitong, Zhu Jiagang, Chung Fulai, et al. Theoretically Optimal Parameter Choices for Support Vector Regression Machines with Noisy Input[J]. Soft Computing, 2005, 9(10): 732-741.

同被引文献34

  • 1徐燕,李锦涛,王斌,孙春明,张森.不均衡数据集上文本分类的特征选择研究[J].计算机研究与发展,2007,44(z2):58-62. 被引量:20
  • 2彭佳红,沈岳,张林峰.数据挖掘中的特征选择及其算法研究[J].计算机工程与设计,2005,26(5):1176-1178. 被引量:15
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:394
  • 4Gustavo E A, 13atista P A, Ronaldo C, et al. A study of the behavior of several methods for balancing machine learning training Data[J]. SGKDD Explorations, 2004, 6(1):20-29.
  • 5Crislianini N,Shawe-TaylorJ.支持向量机导论[M].北京:机械工业出版社,2005.
  • 6Rehan A, Stephen K, Nathalie J. Applying support vector machines to imbalanced datasets[C]//ECMI. 2004, LNAI, 3204 : 39-50.
  • 7Chawla N V, Japkowicz N, Kotcz A. Special issue on learning from imbalanced data sets[J]. Applied Mathematics and Computation, 2004,6( 1 ) : 1-6.
  • 8Cem K, Hulya, Cingi R Estimators for the population variance in simple and stratified random sampling[J]. Applied Mathematics and Computation,2006(4) : 1047- 1059.
  • 9Nagendra C, Irwin M J, Owens R W. Area-time-power Tradeoffs in Parallel Adders[J]. IEEE Trans. on Circuits and SystemsII, 1996, 43(10): 689-702.
  • 10Wang Chuachin, Huang Chenjung, Tai Kunchu. A1.0-GHz 0.6 μm 8-bit Carry Lookahead Adder Using PLA-styled All-n-transister Logic[J]. IEEE Trans. on Circuits and Systems, 2000, 47(2): 133-135.

引证文献5

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部