摘要
数据质量是影响数据开采效果的重要因素 ,这个问题并未受到人们的充分重视。本文针对数据质量在数据开采中的地位 ,给出了数据质量评价的几个主要尺度。并且结合统计学和机器学习的理论 ,分析了解决数据质量的方法 ,强调提高数据质量的出发点在于控制数据源的质量。
It is a widely accepted maxim that decision are no better than data on which they are based. Data quality is vital to data mining,which have been called researchers' attention. In relation to the importance of data quality for data mining,some data quality indicators are analyzed in detail and the improvement methods of data quality are analysed using statistics and machine learning theory etc. in this paper. To solve the problem of data quality,it is a start to control the quality of data sources where data are choosed to warehouse.
出处
《管理工程学报》
CSSCI
2002年第1期21-29,共9页
Journal of Industrial Engineering and Engineering Management
基金
江苏省自然科学基金资助项目 ( 76 0 5 730 0 72 )