期刊文献+

基于数据分区的连续属性整体离散化方法研究

Research on the Data-Partitioning Based on Global Discretization Algorithm for Continuous Attributes
在线阅读 下载PDF
导出
摘要 实际问题中经常涉及连续的数值属性,然而许多归纳学习算法却是针对离散属性空间的。因此,对数据进行预处理的离散化算法一直受到人们的重视。兼顾所有属性间关系的整体离散化是一个重要方法,该文提出基于数据分区的整体离散化算法,它首先对例子集合在各个连续属性上的取值进行统一的放大处理,选出包含最多聚类信息的属性,将整个例子集合粗略的划分为多个分区;然后在各个分区中分别进行聚类、合并。该方法改进了基本的整体离散化算法,并利用农业专家系统中的土壤分类数据对算法进行了验证。 The continuous attribute problems are often encountered in the real world, but many outstanding inductive learning algorithms are mainly based on a discrete feature space. Therefore, discretization techniques, one of the data preprocessing steps, have been attracting much attention before inductive learning algorithms are applied. The global discretization approach considering relation between all the involved attributes is one of the important discretization methods. The global discretization algorithm based on data - partitioning is proposed, which carries on an enlarge treatment on every continuous attribute axis, chooses the attribute which contains the maximum clustering information and partitions the entire data set into several section in the gross, and then carries on clustering treatment and unites them. Finally, the algorithm can improve the basic global discretization method, and is tested on the soil sorts data of agricultural expert system.
出处 《杭州电子科技大学学报(自然科学版)》 2006年第1期18-21,共4页 Journal of Hangzhou Dianzi University:Natural Sciences
关键词 归纳学习 离散化 数据分区 整体离散化 inductive learning discretization data - partitioning global discretization
  • 相关文献

参考文献6

二级参考文献18

  • 1苗夺谦.Rough Set理论及其在机器学习中的应用研究(博士学位论文)[M].北京:中国科学院自动化研究所,1997..
  • 2[1]Catlett J. On changing continuous attributes into ordered discreteattributes. In: Proc European Working Session on Learning (EWSL91). LNAI-482, Porto,Portugal, 1991. 164-178
  • 3[2]Dougherty J, Kohavi R, Sahami M. Supervised and unsupervised discretizationof continuous features. In: Proc the 12th International Conference, Morgan KaufmannPublishers, 1995.194-202
  • 4[3]Quinlan J R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann,1993
  • 5[4]Fayyad U, Irani K. Multi-interval discretizaton of continuous-valuedattributes for classification learning. In: Proc the 13th International JointConference on Artificial Intelligence, San Mateo, CA. Morgan Kaufmann Publishers,1993. 1022-1027
  • 6[5]Li G, Tong F. WILD: Weighted information-loss discretization algorithm forordinal attributes. In: Proc Conference on Intelligent Information Processing, the16th IFIP World Computer Congress 2000, Beijing, China, 2000.254-527
  • 7[6]Quinlan J R. Improved use of continuous attributes in C4.5. Journal ofArtificial Intelligence Research, 1996,4(1):77-90
  • 8[7]Wong A K C, Chiu D K Y. Synthesizing statistical knowledge from incompletemixed-mode data. IEEE Trans Pattern Analysis and Machine Intelligence, 1987,PAMI-9(6):796-805
  • 9[8]Banfield J D, Raftery A E. Model based Gaussian and non-Gaussian clustering.Biometrics, 1993,49(3):803-821
  • 10[9]Mackay D J C. Information Theory, Inference and Learning Algorithms.Cambridge: Cambridge University Press, 2000

共引文献153

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部