摘要
提出一种新的预处理算法AdaP,不仅有效避免了数据过度拟合,且可独立使用。针对不平衡的入侵检测数据集,引入代价敏感机制,基于权值矩阵最小化误分类代价的思想,去除部分训练密集区域、拓展稀疏区域的同时再过滤噪声,最终实现了AdaP算法与AdaCost算法相结合的策略。实验证明此策略充分体现了提升算法有效提升前端弱分类算法分类精度和预处理算法平衡稀有类数据的优势,且可有效提高不平衡入侵检测数据的分类性能。
This paper proposed a new data preprocessing algorithm called AdaP for avoiding over-fitting effectively while processed independently. In view of the imbalanced datasets, introduced the cost-sensitive mechanism into the intrusion detection system by the ideas: weighting matrix to minimize the miselassification costs, removing some datum in dense region expanding in rare region, as well as filtering noises. At last combined the date preprocessing algorithm AdaP with the boosting algorithm AdaCost successfully. The experiment fully reflects the advantages of boosting the classification precision with weak leaner to balance rare classes and shows the strategy which can improve classification performance of the intrusion detention in terms of the imbalanced datasets immensely.
出处
《计算机应用研究》
CSCD
北大核心
2009年第8期3036-3038,3043,共4页
Application Research of Computers
基金
山西省自然科学基金资助项目(2009011022-2)
关键词
不平衡数据
数据预处理
代价敏感
入侵检测
imbalanced data
data preprocessing
cost-sensitive
intrusion detection