摘要
数据挖掘中的噪声检测算法评价多以UCI真实数据为基准数据集,加入模拟的随机噪声,以除去噪声后对挖掘算法性能的提升作为检测效果的评价指标.真实数据内部结构的未知性、随机噪声水平的不确定性,评价指标的单一性使噪声检测算法评价缺乏标准,不易实现算法横向对比.基于此,首先对现有的噪声检测算法评价方法进行分析,提出基于人工数据产生器的噪声检测评价框架及组件,设计了一种基于规则的标准数据产生器及引入随机噪声模型的方法,并提供了具体的评价指标,最后对框架的合理性进行了分析.
Noise detection algorithm evaluation is mostly based on UCI(University of California Irrine) datasets,which are injected random noise.The main evaluation index is the degree of performance elevation after handling noise.The unknown of real data structure,uncertainty of random noise level,simplicity of evaluation index make it difficult to evaluate noise detection algorithm and compare among algorithms.This article firstly analyzes the present noise detection algorithm evaluation process,and promotes the new evaluation process based on artificial data;and then designs a random noise generator and a standard dataset generator based on rules,provides detailed evaluation index.At last,it illustrates the rationality.
出处
《武汉大学学报(工学版)》
CAS
CSCD
北大核心
2011年第5期676-680,共5页
Engineering Journal of Wuhan University
基金
国家自然科学基金项目(编号:60573038)
广东省自然科学基金项目(编号:81510521000009)
关键词
噪声检测
人工数据产生器
评价指标
UCI
noise detection
artificial datasets generator
evaluation index
UCI