期刊文献+

A Semi-Random Multiple Decision-Tree Algorithm for Mining Data Streams 被引量:5

A Semi-Random Multiple Decision-Tree Algorithm for Mining Data Streams
原文传递
导出
摘要 Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams. Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第5期711-724,共14页 计算机科学技术学报(英文版)
基金 This research is supported by the National Natural Science Foundation of China(Grant No.60573174) the Natural Science Foundation of Anhui Province of China(Grant No.050420207).
关键词 data streams Naive Bayes random decision trees data streams, Naive Bayes, random decision trees
  • 相关文献

参考文献2

二级参考文献23

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:163
  • 2宋国杰 王腾蛟 唐世渭.数据流中频繁模式的评估与维护[A]..第20届全国数据库学术会议[C].长沙,2003..
  • 3Agrawal R,Srikant R.Fast algorithms for mining association rules[A].Proceedings of VLDB[C].SanMateo:Morgan Kauffman Publishers Inc,1994:487-499.
  • 4Manku G S,Motwani R.Approximate frequency counts over data streams[A].Proceedings of VLDB[C].San Mateo:Morgan Kauffman Publishers Inc,2002:346-357.
  • 5Chang J H,Lee W S.Finding recent frequent itemsets adaptively over online data streams[A].Proceedings of KDD[C].New York:ACM Press,2003:487-492.
  • 6Giannella C,Han J,Pei J,et al.Mining frequent patterns in data streams at multiple time granularities[A].Next Generation Data Mining[C].Menlo Park:AAAI/MIT,2003:191-212.
  • 7Kryszkiewicz M,Rybinski H,Gajek M.Dataless transitions between concise representations of frequent patterns[J].Intelligent Information Systems,2004,22(1):41-70.
  • 8Feldman R,Aumann Y,Amir A,etal.Efficient algorithms for discovering frequent sets in incremental databases[A].Proceedings of SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery[C].New York:ACM Press,1997:59-66.
  • 9Thomas S,Bodagala S,Alsabti K,et al.An efficient algorithm for the incremental updation of association rules[A].Proceedings of KDD[C].New York:ACM Press,1997:263-266.
  • 10B.Babcock,S.Babu,M.Datar,etal.Models and issues in data stream systems.In:Proc.21st ACM Symposium on Principles of Database Systems.New York:ACM Press,2002.1~16

共引文献20

同被引文献49

  • 1倪志伟,黄玲,李锋刚,忻凌.数据流管理与挖掘研究[J].合肥工业大学学报(自然科学版),2005,28(9):1157-1162. 被引量:5
  • 2WANG,Ying-chun(王迎春),LI,Da-yong(李大永),YIN,Ji-long(尹纪龙),PENG,Ying-hong(彭颖红).Application of Decision Tree Algorithm in Stamping Process[J].Journal of Shanghai Jiaotong university(Science),2005,10(4):368-372. 被引量:1
  • 3王勇,李战怀,张阳,蒋芸.基于相反分类器的数据流分类方法[J].计算机科学,2006,33(8):206-209. 被引量:2
  • 4Golab L,Ozsu M T. Issues in data stream management[J]. SIGMOD Rec, 2003,32 (2) : 5-14.
  • 5Supratik B, Sue M. Network performance monitoring and measurement=techniques and experience[C]//MMNS Tu torial, 2002 : 461 - 470.
  • 6Quinlan J R. Induction of decision trees[J]. Machine Learning, 1986,1 (1) : 81-106.
  • 7Quinlan J R. C4.5:programs for machine learning[M]. San Francisco, CA; Morgan Kaufmann Publishers Inc, 1993; 68-70.
  • 8Domingos P, Hulten G. Mining high speed data streams [C]//Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000 : 71- 80.
  • 9Gama J, Rocha R, Medas P. Accurate decision trees for mining high speed data streams[C]//Proceedings of the 9th ACM SIGKDD International Conferece on Knowledge Discovery and Data Mining, 2003 : 523-528.
  • 10Breiman L. Random forests[J]. Machine Learning, 2001,45 (1):5-32.

引证文献5

二级引证文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部