期刊文献+

基于k-best树模式的树流分类算法研究

Research on Algorithm of Tree Stream Classification Based on k-best Tree Pattern
在线阅读 下载PDF
导出
摘要 对结构化数据的分类方法大多是基于频繁子结构挖掘,然后通过排序剪枝等处理将频繁子结构与类关联得到结构规则进而分类.本文针对树形结构数据提出一种基于重要树模式的数据流分类方法 TSC,首先使用相关度量发现k个与类相关的最具有判别能力的树模式,在该过程中,使用分支界限法提高搜索效率,无需挖掘完全模式,另一方面对参考度不断更新从而避免后剪枝操作,得到的树模式可直接用于分类.同时,和以往的方法相比,TSC是无启发式算法,只需用户设置最大规则集数目.然后,采用经典adwin思想处理演变树流中的局部概念漂移.实验表明,与以往的方法相比,TSC生成更少的有效规则集使得测试时间大大降低,总运行时间相对较短的同时可达到较高正确率,简单高效. The most existing methods to classify structured data are based on frequent substructure mining, then through the step of or- dering and pruning frequent sub-structure, get structural rules which are correlated with corresponding class values. This paper propo- ses TSC, an effective algorithm for classifying tree stream based on significant tree pattern. First of all, this method uses correlation measures to find k most discriminative tree patterns correlating with the class values. During this process, TSC uses branch and bound technology to improve the search efficiency without mining the complete frequent patterns, on the other hand, updates the threshold to avoid the post-prune step, and allows classifying directly using the tree patterns. Meanwhile, compared to existing methods, TSC is a no heuristic algorithm and only need to choose the maximum size of the rule set. Then, TSC uses classical adwin method to deal with local concept drift in evolving tree stream. The experimental results demonstrate that compared with the previous methods, TSC is sample and efficient which generates less effective rules to reduce the testing time greatly, and fulfills less total running time with higher predictive accuracy rate.
出处 《小型微型计算机系统》 CSCD 北大核心 2013年第6期1328-1333,共6页 Journal of Chinese Computer Systems
关键词 树流 分类 k-best树模式 相关度量 tree stream classification k-best tree pattem correlation measures
  • 相关文献

参考文献2

二级参考文献26

  • 1王鹏,吴晓晨,王晨,汪卫,施伯乐.CAPE——数据流上的基于频繁模式的分类算法[J].计算机研究与发展,2004,41(10):1677-1683. 被引量:7
  • 2Wang J, Karypis G. HARMONY: Efficiently mining the best rules for classification [C] //Proc of 2005 SIAM Conf of Data Mining (SDM'05). 2005: 205-216
  • 3Liu B, Hsu W, Ma Y. Integrating classification and association rule mining [C] //Proc of KDD'98. 1998:80-86
  • 4Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules [C] //Proc of ICDM'01. Berlin: Springer, 2001:369-376
  • 5Gosta G, Jianfei Z. Efficiently Using prefix-trees in mining frequent itemsets [C] //Proc of FIMI'04. Piscataway, NJ: IEEE, 2003
  • 6Chi Y, Wang H, Yu P S, et al. Moment: Maintaining closed frequent itemsets over a stream sliding window [C]//Proc of ICDM'04. Piscataway, NJ: IEEE, 2004:59-66
  • 7Pei J, Han J, Wang J. Closet+: Searching for the best strategies for mining frequent closed itemsets [C]//Proc of SIGKDD '03. New York: ACM, 2003
  • 8Burdiek D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases [C] //Proc of the 17tb Int Conf on Data Engineering. Piseataway, NJ: IEEE, 2001:443-452
  • 9Coenen F. LUCS KDD implementation of CMAR [OL]. [2007-10-07J. http://www. esc. liv. ac. uk/-frans/KDD/ Software/CMAR/emar. html, The University of Liverpool
  • 10Blake C L, Merz C J. UCI repository of machine learning databases [OL]. [2007-10-07]. http://www. ics. uci. edu/-mlearn/MLRepository.html

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部