摘要
在给数据挖掘这类应用准备数据的过程中,面临着一系列数据清洗问题,而成功的数据清洗往往需要领域知识的支持。本文设计了一个基于领域知识的数据清洗框架,它在领域专家的支持下,通过抽样数据获得清洗规则;专家系统引擎利用获得的知识,在整个数据集上进行清洗;它具有自学习能力,在清洗过程中不断的优化清洗规则;它的知识库易扩展,框架通用性较强。
Many data cleaning problems will be accounted in data preparing process of data mining applications. Successful data cleaning methods often need the support of domain knowledge. This paper proposes a domain knowledge based data-cleaning framework. Supported by the expetts of the domain, it obtains cleaning rules through a sample data set. And using these rules, an expert engineer cleans the whole data set. It has the ability of self-study and can optimize the cleaning rules through the process of cleaning. Its knowledge base is easy to extend.
出处
《信息技术与信息化》
2005年第5期100-103,共4页
Information Technology and Informatization