摘要
随着互联网的发展,垃圾邮件成为互联网上一个日益严重的安全问题.特征选择是垃圾邮件过滤过程中的一个重要环节,直接影响垃圾邮件过滤的精度.通过分析集中度、分散度和频度这三种用来表征特征项重要程度的因素以及分析特征选择方法根据特征项的重要程度生成特征权值的方式,从而提出一种新的特征选择方法,这种方法将三种因素组合起来表征特征项对于分类的强弱,并用Logistic方程表示由组合因素表征出的特征权值,根据权值的大小选择对分类影响大的特征.实验表明这种方法对垃圾邮件的识别能力有一定的提高.
Along with the widespread use of internet,spam is becoming a threat against the system security.Feature extraction technology is an essential part of spam filtering,which directly affects the spam filtering precision.A new approach has been presented by analyzing the three influence factors used to denote the significance of feature items.The way of common feature extraction technology to generate the value of the feature items is discussed.This new approach combined the factors to show the feature's ability to filter spams,using the Logistic equation directly calculated from the combination of factors to denote the value of feature,then select the feature according to the value.The experimental results show that the new approach is superior in feature selection.
出处
《内蒙古大学学报(自然科学版)》
CAS
CSCD
北大核心
2010年第4期450-455,共6页
Journal of Inner Mongolia University:Natural Science Edition