摘要
提出一个基于集对分析的半监督ISODATA聚类算法,用于网络异常检测。在三方面进行了改进:首先,算法能够直接处理字符数字混合属性的数据,并使用集对分析来计算数据记录之间的距离;其次,算法同时处理有标号和无标号的数据,并利用少量的有标号数据来指导算法的分裂过程;最后,将算法的输入参数减少到只有两个。在KDD99入侵检测数据集上的实验结果显示,该算法获得了95.62%的检测率和1.29%的误报率。
A semi-supervised ISODATA clustering algorithm based on the Set Pair Analysis (SPA) is proposed for network anomaly detection.This paper improves the original ISODATA algorithm mainly in three aspects.Firstly,the modified algorithm can directly process the mixed attributes of symbolic and numeric values,and employ the SPA to calculate the distance between data records.Secondly,the algorithm can process both labeled and unlabeled samples.The small portion of labeled samples is used to supervise the clustering process in the splitting stage.Thirdly,the initial parameters needed to be input into the algorithm are reduced to only two.Experimental result on the KDD 99 intrusion detection datasets shows that the algorithm has high detection rate(95.62%) while maintaining a low false positive rate(1.29%).
出处
《计算机工程与应用》
CSCD
北大核心
2009年第36期99-100,231,共3页
Computer Engineering and Applications
基金
北京市教委与北京交通大学共建项目No.353011535~~