一种基于贝叶斯测度的有监督离散化方法被引量：5

A New Supervised Discretization Method Based on Bayes Measure

下载PDF

导出

摘要传统的朴素贝叶斯不能处理连续属性,文中基于贝叶斯测度提出一种有监督离散化方法。它能够在无先验知识的前提下,自动寻求最佳的离散子区间数目和区间划分。在此基础上根据MDL准则控制离散化子区间的数目,使学习方法的精确度和复杂度达到均衡。在UCI机器学习数据集上对该方法进行了验证,取得了良好的效果。 Standard naive Bayes can not handle continuous attributes. A new supervised discretization method is proposed, which is based on Bayes measure to automatically find the most appropriate boundaries for discretizationand the number of intervals. At the same time, it embodies tradeoff between the accuracy and the complexity of the learned discretization by applying MDL principle. Experimental results on UCI data sets indicate that the classification accuracy is substantially improved.

作者李海军王钲旋王利民苑森淼

机构地区烟台大学计算机学院吉林大学计算机科学与技术学院

出处《仪器仪表学报》 EI CAS CSCD 北大核心 2005年第8期786-789,共4页 Chinese Journal of Scientific Instrument

基金国家自然科学基金项目(60275026)资助。

关键词机器学习朴素贝叶斯有监督离散化贝叶斯测度 Machine learning Naive Bayes Supervised discretization Bayes measure

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献10

1N. Friedman, D. Geiger, M. Goldszmidt. Bayesian network classifiers. Machine Learning, 1997, 29:131～163.
2A. McCallum, K. Nigam. A comparison of event models for naive Bayes text classification. Proc. of AA-AI-98 Workshop on Learning for Text Categorization, Madison, WI, 1998, 137～142.
3R. O. Duda, P. E. Hart. Pattern classification and scene analysis. New York:Wiley, 1973.
4J. Dougherty, R. Kohavi, M. Sahami. Supervised and unsupervised discretization of continuous features. Proc. of the 12th International Conference on Machine Learning, San Francisco, CA:Morgan Kaufmann Publishers, 1995, 194～202.
5N. Friedman, M. Goldszmidt. Discretization of contin-uous attributes while learning Bayesian networks. Proc. of the 13th International Conference on Machine Lea-rning, Bari, Italy, 1996, 157～165.
6李刚,童頫.基于混合概率模型的无监督离散化算法[J].计算机学报,2002,25(2):158-164. 被引量：16
7边肇祺张学工.模式识别[M].北京：清华大学出版社,2002.296-304.
8B. W Silverman. Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability, 1986.
9P. Smyth, A. Gray, U. Fayyad. Retrofitting decision tree classifiers using kernel density estimation. Proc. of the 12th International Conference on Machine Learning, Morgan Kaufmann Publisthers, 1995, 506～514.
10.[EB/OL].http://ftp.ics.uci.edu/pub/machine-learning-databa-ses,.

二级参考文献14

1[1]Catlett J. On changing continuous attributes into ordered discreteattributes. In: Proc European Working Session on Learning (EWSL91). LNAI-482, Porto,Portugal, 1991. 164-178
2[2]Dougherty J, Kohavi R, Sahami M. Supervised and unsupervised discretizationof continuous features. In: Proc the 12th International Conference, Morgan KaufmannPublishers, 1995.194-202
3[3]Quinlan J R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann,1993
4[4]Fayyad U, Irani K. Multi-interval discretizaton of continuous-valuedattributes for classification learning. In: Proc the 13th International JointConference on Artificial Intelligence, San Mateo, CA. Morgan Kaufmann Publishers,1993. 1022-1027
5[5]Li G, Tong F. WILD: Weighted information-loss discretization algorithm forordinal attributes. In: Proc Conference on Intelligent Information Processing, the16th IFIP World Computer Congress 2000, Beijing, China, 2000.254-527
6[6]Quinlan J R. Improved use of continuous attributes in C4.5. Journal ofArtificial Intelligence Research, 1996,4(1):77-90
7[7]Wong A K C, Chiu D K Y. Synthesizing statistical knowledge from incompletemixed-mode data. IEEE Trans Pattern Analysis and Machine Intelligence, 1987,PAMI-9(6):796-805
8[8]Banfield J D, Raftery A E. Model based Gaussian and non-Gaussian clustering.Biometrics, 1993,49(3):803-821
9[9]Mackay D J C. Information Theory, Inference and Learning Algorithms.Cambridge: Cambridge University Press, 2000
10[10]Dempster A P, Laird N M, Rubin D B. Maximum likelihood for incomplete data viathe EM algorithm. Journal of the Royal Statistical Society, Series B, 1977,39(1):1-38

共引文献41

1蒲凌杰,曾繁慧,郭嗣琮.2-Flou数的因素值离散化算法[J].辽宁工程技术大学学报（自然科学版）,2019,38(6):573-576. 被引量：1
2胡煜.偏最小二乘方法和二次判别分析方法应用于基因芯片数据分析[J].鞍山师范学院学报,2007,9(4):20-24. 被引量：1
3田有文,王立地,姜淑华.基于图像处理和支持向量机的玉米病害识别[J].仪器仪表学报,2006,27(z3):2123-2124. 被引量：21
4田有文.基于纹理特征和支持向量机的葡萄病害的识别[J].仪器仪表学报,2005,26(z1):606-608. 被引量：18
5杨国亮,王志良,任金霞,李钟侠.一种基于遗传操作的聚类算法[J].计算机应用,2003,23(z2):199-201.
6魏育飞.离散型区间概率和离散型第二类模糊概率随机变量数学期望的性质与求解[J].佳木斯教育学院学报,2013(2):131-131.
7王立宏,吴彦,吴耿锋.离散格的一种启发式搜索算法[J].计算机应用,2004,24(8):41-43. 被引量：2
8杨洁,冯力刚,蒋加伏.基于小波包和支持向量机的人脸识别[J].计算机仿真,2004,21(9):131-133. 被引量：7
9谢蓄芬,刘泊,王德军.一种改进BP神经网络在模式识别中的应用[J].哈尔滨理工大学学报,2004,9(5):63-65. 被引量：7
10贺跃,郑建军,朱蕾.一种基于熵的连续属性离散化算法[J].计算机应用,2005,25(3):637-638. 被引量：15

同被引文献55

1朱建元.柴油机气缸盖振动的自回归谱分析与状态监测[J].上海海事大学学报,2005,26(3):1-4. 被引量：6
2覃光华,李祚泳.BP网络过拟合问题研究及应用[J].武汉大学学报（工学版）,2006,39(6):55-58. 被引量：24
3毛奇凰,Myint Thu Ya Zaw.应用小波包变换的斑点噪声抑制方法(英文)[J].上海海事大学学报,2007,28(1):22-27. 被引量：2
4蔡振雄,李寒林,林金表,鹿勇.船舶柴油机拉缸故障振动诊断技术[J].上海海事大学学报,2007,28(1):84-88. 被引量：7
5KONONENKO I. On biases in estimating multi-valued attributes[ C ]//14th Int Joint Conf on Articial Intelligence. San Francisco, CA, USA: Morgan Kaufmann Publishers, 1995: 1034-1040.
6SAHAMI M. Learning limited dependence Bayesian classifiers[ C ]//2nd Int Conf on Knowledge Discovery & Data Mining. Portland: AAAI Press, 1996: 334-338.
7KEOGH E, PAZZANI M. A Comparison of distribution-based and classification-based approaches [ C ]// Workshop on Artificial Intelligence & Stat, Francisco, USA: Morgan Kaufmann Publishers, 1999: 225-230.
8WEBB G, BOUGHTON J, WANG Z. Not so naive Bayes: aggregating one-dependence estimators[J]. Machine Learning, 2005, 58: 5-24.
9刘业政,焦宁,姜元春.连续属性离散化算法比较研究[J].计算机应用研究,2007,24(9):28-30. 被引量：22
10Igor Kononenko.On Biases in Estimating Multi -Valued At-tributes [A] . 14th International Joint Conference on ArticialIntelligence [C] . 1995:1034-1040.

引证文献5

1王鑫,于洪亮,张琳,宋玉超.基于集成超1-依赖分类器的柴油机振动信号故障诊断方法[J].上海海事大学学报,2011,32(3):49-53.
2王鑫,张琳.超1-依赖贝叶斯信号智能分类算法[J].电脑编程技巧与维护,2013(12):85-88.
3刘晓蔚.数据挖掘预测模型在脑伤患者认知功能康复中的应用与研究[J].东莞理工学院学报,2013,20(5):51-58.
4黄睿.基于RGMM的离散基因表达数据关联规则挖掘[J].计算机应用与软件,2014,31(9):191-193.
5刘晓蔚.数据挖掘预测模型在脑损伤患者认知功能康复中的应用[J].计算机应用与软件,2014,31(12):221-224. 被引量：1

二级引证文献1

1陆桂明,张源,周志敏.基于机器学习的贫困生分类预测研究[J].计算机应用与软件,2019,36(1):316-319. 被引量：14

1王成刚,应朝龙,李建海,刘志远.基于DPSO的不确定系统测试性建模[J].电子器件,2010,33(4):521-524.
2胡光华,胡光涛.一种在线自适应控制马氏链的强化学习算法[J].云南大学学报（自然科学版）,2000,22(1):9-12. 被引量：3
3杨小辉,方宗德,杨青.网络环境下工程分析可视化数据简化与压缩[J].机械科学与技术,2006,25(1):46-49.
4吴蕾,田儒雅,张学福.稀疏分层概率自组织图实例迁移学习方法[J].计算机应用,2016,36(3):692-696. 被引量：3
5罗兰,曾斌.基于时序向量聚类的周期关联规则发现算法[J].计算机工程,2010,36(19):110-112. 被引量：2

仪器仪表学报

2005年第8期

浏览历史

内容加载中请稍等...

一种基于贝叶斯测度的有监督离散化方法被引量：5

参考文献10

二级参考文献14

共引文献41

同被引文献55

引证文献5

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于贝叶斯测度的有监督离散化方法 被引量：5

参考文献10

二级参考文献14

共引文献41

同被引文献55

引证文献5

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于贝叶斯测度的有监督离散化方法被引量：5