摘要
针对不平衡数据集,提出一种基于后验概率的特征选择算法。该算法引入基于Parzen-window方法估算的不均衡因子,并以Tomeklinks中点为初始值进行迭代,找出满足后验概率相等的判别边界点,通过对这些点法向量进行投影计算得到各特征的权值。实验表明,对于不平衡数据集,该算法在不降低分类器总体性能的基础上,不仅可以有效降低维度,节省计算开销,而且能够避免常规特征选择算法用于不平衡数据时忽视小类的缺点。
In this paper, a posterior-probability-based feature selection algorithm is proposed for imbalanced datasets. In the proposed algorithm, an imbalanced factor is introduced and computed by Parzen-window estimation. The middle point of Tomek links is chosen as the initial point. Accordingly, this algorithm is iterated to find out the boundary points which have the equality of posterior probability. Through the project computation on the normal vectors of these points, the weight of each feature can be obtained, which actually indicates the importance degree of each feature. The experimental results on three real-word datasets demonstrate that this proposed algorithm can not only reduce the computational cost but also overcome the shortcoming that the majority class may be detected well but the minority class may be ignored in the conventional feature selection algorithm.
出处
《计算机工程》
CAS
CSCD
北大核心
2008年第19期1-3,共3页
Computer Engineering
基金
国家部委基础研究基金资助项目
教育部重点科学研究基金资助项目(105087)
2004年教育部优秀人才支持计划基金资助项目(NCET-04-0496)
模式识别国家重点实验室开放课题基金资助项目
南京大学软件新技术国家重点实验室开放课题基金资助项目
关键词
不平衡数据集
特征选择
后验概率
imbalanced datasets
feature selection
posterior probability