基于k-best树模式的树流分类算法研究

Research on Algorithm of Tree Stream Classification Based on k-best Tree Pattern

下载PDF

导出

摘要对结构化数据的分类方法大多是基于频繁子结构挖掘,然后通过排序剪枝等处理将频繁子结构与类关联得到结构规则进而分类.本文针对树形结构数据提出一种基于重要树模式的数据流分类方法 TSC,首先使用相关度量发现k个与类相关的最具有判别能力的树模式,在该过程中,使用分支界限法提高搜索效率,无需挖掘完全模式,另一方面对参考度不断更新从而避免后剪枝操作,得到的树模式可直接用于分类.同时,和以往的方法相比,TSC是无启发式算法,只需用户设置最大规则集数目.然后,采用经典adwin思想处理演变树流中的局部概念漂移.实验表明,与以往的方法相比,TSC生成更少的有效规则集使得测试时间大大降低,总运行时间相对较短的同时可达到较高正确率,简单高效. The most existing methods to classify structured data are based on frequent substructure mining, then through the step of or- dering and pruning frequent sub-structure, get structural rules which are correlated with corresponding class values. This paper propo- ses TSC, an effective algorithm for classifying tree stream based on significant tree pattern. First of all, this method uses correlation measures to find k most discriminative tree patterns correlating with the class values. During this process, TSC uses branch and bound technology to improve the search efficiency without mining the complete frequent patterns, on the other hand, updates the threshold to avoid the post-prune step, and allows classifying directly using the tree patterns. Meanwhile, compared to existing methods, TSC is a no heuristic algorithm and only need to choose the maximum size of the rule set. Then, TSC uses classical adwin method to deal with local concept drift in evolving tree stream. The experimental results demonstrate that compared with the previous methods, TSC is sample and efficient which generates less effective rules to reduce the testing time greatly, and fulfills less total running time with higher predictive accuracy rate.

作者贾敏杰王黎明

机构地区郑州大学信息工程学院

出处《小型微型计算机系统》 CSCD 北大核心 2013年第6期1328-1333,共6页 Journal of Chinese Computer Systems

关键词树流分类 k-best树模式相关度量 tree stream classification k-best tree pattem correlation measures

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献2

1敖富江,王涛,刘宝宏,黄柯棣.CBC-DS:基于频繁闭模式的数据流分类算法[J].计算机研究与发展,2009,46(5):779-786. 被引量：3
2邹兆年,高宏,李建中,张硕.演变图上的连接子图演变模式挖掘[J].软件学报,2010,21(5):1007-1019. 被引量：5

二级参考文献26

1王鹏,吴晓晨,王晨,汪卫,施伯乐.CAPE——数据流上的基于频繁模式的分类算法[J].计算机研究与发展,2004,41(10):1677-1683. 被引量：7
2Wang J, Karypis G. HARMONY: Efficiently mining the best rules for classification [C] //Proc of 2005 SIAM Conf of Data Mining (SDM'05). 2005: 205-216
3Liu B, Hsu W, Ma Y. Integrating classification and association rule mining [C] //Proc of KDD'98. 1998:80-86
4Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules [C] //Proc of ICDM'01. Berlin: Springer, 2001:369-376
5Gosta G, Jianfei Z. Efficiently Using prefix-trees in mining frequent itemsets [C] //Proc of FIMI'04. Piscataway, NJ: IEEE, 2003
6Chi Y, Wang H, Yu P S, et al. Moment: Maintaining closed frequent itemsets over a stream sliding window [C]//Proc of ICDM'04. Piscataway, NJ: IEEE, 2004:59-66
7Pei J, Han J, Wang J. Closet+: Searching for the best strategies for mining frequent closed itemsets [C]//Proc of SIGKDD '03. New York: ACM, 2003
8Burdiek D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases [C] //Proc of the 17tb Int Conf on Data Engineering. Piseataway, NJ: IEEE, 2001:443-452
9Coenen F. LUCS KDD implementation of CMAR [OL]. [2007-10-07J. http://www. esc. liv. ac. uk/-frans/KDD/ Software/CMAR/emar. html, The University of Liverpool
10Blake C L, Merz C J. UCI repository of machine learning databases [OL]. [2007-10-07]. http://www. ics. uci. edu/-mlearn/MLRepository.html

共引文献6

1郭鑫,董坚峰,周清平.动态数据库中的频繁子树挖掘算法[J].计算机科学,2011,38(5):138-141.
2马青霞,李广水,孙梅.频繁模式挖掘进展及典型应用[J].计算机工程与应用,2011,47(15):138-144. 被引量：6
3颜一鸣,郭鑫.一种新的不确定树模式聚类算法[J].计算机工程与科学,2013,35(7):156-163. 被引量：1
4李艳,黄光球.动态攻击网络演化分析模型[J].计算机应用研究,2016,33(1):266-270. 被引量：2
5宋宝燕,纪婉婷,丁琳琳.基于快照的大规模动态图相似节点查询算法[J].计算机应用,2016,36(2):358-363. 被引量：2
6丁剑,韩萌,李娟.概念漂移数据流挖掘算法综述[J].计算机科学,2016,43(12):24-29. 被引量：14

1莫杰众,杨宗源.编码过程的相关度量及其工具实现[J].计算机工程,2005,31(3):108-109.
2汤赫男.浅谈0-1背包问题的常用算法[J].消费电子,2013(20):215-215.
3赵宇红,吴爱燕,郑雪峰,涂序彦.复杂网络模型评述[J].唐山学院学报,2009,22(6):50-53.
4张桂玲,付晓男.基于相关规则余弦值分类的改进型平行坐标可视化分析[J].天津工业大学学报,2014,33(6):57-61.
5张焕生,崔炳德,王政峰,徐德生.基于图的频繁子结构挖掘算法综述[J].信息化纵横,2009(10):5-9. 被引量：2
6陈孝礼,刘培玉.应用于垃圾邮件过滤的词序列核[J].计算机应用,2011,31(3):698-701. 被引量：5
7尹志武,黄上腾.一种自适应局部概念漂移的数据流分类算法[J].计算机科学,2008,35(2):138-139. 被引量：8
8吴忠,潘杰,李国广,徐华,夏锐.一种简化、高效的NCC图像匹配算法[J].科技传播,2013,5(14):128-129. 被引量：1
9曹炬,谭毅华,马杰,田金文.从移动背景红外序列图像中检测运动目标[J].电子与信息学报,2005,27(1):43-46. 被引量：6
10邴丕政,戴紫彬,戴强.基于哈希树的度量证据可信存储方案设计[J].计算机应用与软件,2017,34(1):316-320. 被引量：1

小型微型计算机系统

2013年第6期

浏览历史

内容加载中请稍等...

基于k-best树模式的树流分类算法研究

参考文献2

二级参考文献26

共引文献6

相关作者

相关机构

相关主题

浏览历史