期刊文献+

基于人工数据产生器的噪声检测评价框架

Noise detection algorithm evaluation based on artificial datasets generator
原文传递
导出
摘要 数据挖掘中的噪声检测算法评价多以UCI真实数据为基准数据集,加入模拟的随机噪声,以除去噪声后对挖掘算法性能的提升作为检测效果的评价指标.真实数据内部结构的未知性、随机噪声水平的不确定性,评价指标的单一性使噪声检测算法评价缺乏标准,不易实现算法横向对比.基于此,首先对现有的噪声检测算法评价方法进行分析,提出基于人工数据产生器的噪声检测评价框架及组件,设计了一种基于规则的标准数据产生器及引入随机噪声模型的方法,并提供了具体的评价指标,最后对框架的合理性进行了分析. Noise detection algorithm evaluation is mostly based on UCI(University of California Irrine) datasets,which are injected random noise.The main evaluation index is the degree of performance elevation after handling noise.The unknown of real data structure,uncertainty of random noise level,simplicity of evaluation index make it difficult to evaluate noise detection algorithm and compare among algorithms.This article firstly analyzes the present noise detection algorithm evaluation process,and promotes the new evaluation process based on artificial data;and then designs a random noise generator and a standard dataset generator based on rules,provides detailed evaluation index.At last,it illustrates the rationality.
作者 尹华 董红斌
出处 《武汉大学学报(工学版)》 CAS CSCD 北大核心 2011年第5期676-680,共5页 Engineering Journal of Wuhan University
基金 国家自然科学基金项目(编号:60573038) 广东省自然科学基金项目(编号:81510521000009)
关键词 噪声检测 人工数据产生器 评价指标 UCI noise detection artificial datasets generator evaluation index UCI
  • 相关文献

参考文献12

  • 1Gamberger D, Lavrac N. Noise detection and elimina- tion applied to noise handling in a KRK chessendgame [C]//Proceedings of the 5^th International Workshop on Inductive Logic Programming, 1996:59-75.
  • 2Choh Man Teng. A comparison of noise handling tech- niques[C]//Proceedings of the International Florida Artificial Intelligence Research Symposium, 2001:269- 273.
  • 3Choh Man Teng. Correcting noisy data [C]//Proceed ings of the International Conference on Machine Learn ing, Morgan Kaufmann, San Mateo, California, 1999: 239-248.
  • 4Zhu X, Wu X, Yang Y. Error detection and impact sensitive instance ranking in noisy datasets [C]//Pro ceedings of 19^th National Conference on Artificial Intel ligence(AAAI-2004), San Jose, CA, 2004 : 378-383.
  • 5Jason D Van Hulse, Taghi M Khoshgoftaar, Huang Haiying. The pairwise attribute noise detection algo- rithm [J]. Knowledge and Information Systems, 2007,11 (2) : 171-190.
  • 6Breiman L, Friedman J, Olshen R, Stone C. Classifi- cation and Regression Trees[M]. Wadsworth, Bel- mont, CA, 1984.
  • 7Lounis H, Bisson G M. Evaluation of learning sys- tems: an artificial databased approach [C]//Proceed- ings of EWSL, 1991 : 463-481.
  • 8Wu X. Knowledge Acquisition from Databases [M]. Ablex Publishing Corp, 1995.
  • 9Agrawal R, Imielinski T, Swami A. Mining associa- tions between sets of items in large databases [C]// ACM SIGMOD Int'l Conf. on Management of Data, Washington D. C. , 1993:207-216.
  • 10杜鷁,李德毅.一种测试数据挖掘算法的数据源生成方法[J].计算机研究与发展,2000,37(7):776-782. 被引量:16

二级参考文献22

  • 1李德毅,孟海军,史雪梅.隶属云和隶属云发生器[J].计算机研究与发展,1995,32(6):15-20. 被引量:1334
  • 2Bitton D,DeWitt D J,Turbyfill C. Bcnchmarking database systems: A systematic approach[ A] Proceedings of the International Conference on Very Large Databases [ C ]. Florence : Morgan Kaufmann Publishers, 1983 : 8-19.
  • 3Gray J. Quickly Generating Billion-record Synthetic Databases[C] In Proceedings of the ACM International Conference on Management of Data, 1994:25-36.
  • 4Turbyfill C,Orju C,Bitton D. AS3AP:A Comparative Rrelational Data- base Benchrnark[C]In Proceedings of Compcon. ,1989:560-564.
  • 5O'Ncil P E. A Set Query Benchmark for Large Databases[C]. In: Proceedings of the International Computer Measurement Group Conference, 1989:209-215.
  • 6Transaction Processing Performance Council. TPC BENCHMARK H ( Decision support ) standard specification [ EB/OL ]. http:// www. tpc. org/tpch ,2006.
  • 7Bruno N,Chaudhuri S,Thomas D. Generating queries with cardinality constraints for DBMS testing[ J]. IEEE Transactions on Knowledge and Data Engineering ,2006,18 (12) :1721-1725.
  • 8Houkjaer K,Torp K,Wind R. Simple and Realistic Data Generation [ C] In Proceedings of the International Conference on Very Large Databases. ,2006:1243-1246.
  • 9Arenas M,Bertossi L,Chomicki J. Consistent Query Answers in Inconsistent Databases [ C ] In Proceedings of the ACM International Conference on Management of Data, 1999:68 -79.
  • 10Hernandez M A, Stolfo S J. Real-world data is dirty:data ceansing and the merge/purge problem [ J ]. Data Mining and Knowledge Discovery ,1998,2( 1 ) :9-37.

共引文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部