摘要
针对垃圾邮件过滤中代价不对等问题,即正常邮件被误判为垃圾邮件的代价远大于垃圾邮件被误判为正常邮件,构建一种使用2层结构的组合分类器框架。对样本邮件进行预处理,使文本特征和行为特征相结合。在提高单分类器性能的基础上,对不同分类器进行组合优化,并通过反馈及时调整模型,实现高效的自学习功能。
Aiming at the unequal cost problem of spare filter that the cost of ham misclassification is much more than the cost of spam misclassification, this paper proposes a combinational classifier with two-layer structure. Email samples are pre-processed. The filter combines the behavioral features and text features, and optimizes the combination of different classifiers based on improving the performance of a single one. The classifier adjusts the model by timely feedback to make the filter obtain efficient self-learning function.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第18期194-196,共3页
Computer Engineering
基金
国家"863"计划基金资助项目(2007AA01Z197)
关键词
垃圾邮件过滤
组合分类器
2层结构
比特熵
误判率
spam filter
combinational classifier
two-layer structure
bit entropy
false positive rate