摘要
数据挖掘是从大量原始数据中抽取隐藏知识的过程。大部分数据挖掘工具采用规则发现和决策树分类技术来发现数据模式和规则,其核心是归纳算法。与传统统计方法相比,基于机器学习技术得到的分类结果具有较好的可解释性。在针对特定的数据集进行数据挖掘时,如果缺乏相应的领域知识,用户或决策者就很难确定选择何种归纳算法。因此,需要尝试各种算法。借助MLC++,决策者能够轻而易举地比较不同分类算法对特定数据集的有效性,从而选择合适的分类算法。同时,系统开发人员也可以利用MLC++设计各种混合算法。
Data Mining is the process of extracting hidden knowledge from large volumes of raw data. Most data mining tools use rule discovery and decision tree technology to extract data patterns and rules; its core is the inductive algorithm. The classification results obtained using machine learning based technology are more explainable than the traditional methods. However, when performing data mining from specific data set, the user or the decision maker may not know how to choose the appropriate method without the corresponding domain knowledge. Therefore, the user must try various inductive algorithms. Using MLC + +, the decision maker could compare the utility of different algorithms on specific dataset easily to choose the appropriate classification algorithm. The system developer could also use MLC + + to design hybrid classification algorithms.
出处
《计算机仿真》
CSCD
2006年第4期103-105,113,共4页
Computer Simulation
关键词
数据挖掘
机器学习
分类算法
决策树
程序设计
Data mining
Machine learning
Classification algorithms
Decision trees
Programming