基于索引效用的Top-k高效用项集挖掘方法被引量：3

A Top-k High Utility Itemset Mining Method Based on the Index Utility

下载PDF

导出

摘要已有的Top-k高效用项集挖掘为了保持向下封闭性,利用项集的事务效用代替其真实效用,使得项集效用被估计得过大,导致剪枝效果不好,挖掘效率较低.针对这一问题,提出了索引效用的概念,在此基础上建立两级索引,并进行索引剪枝,增强了挖掘中剪枝的效果,提高了Top-k高效用项集挖掘的效率;此外,通过建立效用矩阵,支持对项集效用的快速计算,进一步提高了挖掘效率.不同类型数据集上的实验验证了所提出的Top-k高效用项集挖掘方法的有效性和高效性. The existing methods of Top-k high utility itemset mining substitute the transaction utilities of itemsets for their real utilities in order to keep the downward closure property. This makes the utilities of itemsets be estimated too large, resulting in bad pruning effect and low mining efficiency. To solve this problem, the concept of the index utility was proposed. On this basis, the two-level index was built and pruned, by which the pruning effect was strengthened and the efficiency of Top-k high utility itemset mining was enhanced. Moreover, the fast calculation of itemset utilities was supported by building the utility matrix. Therefore, the mining efficiency was further enhanced. The experiments on different types of datasets validate the effectiveness and the efficiency of the proposed method.

作者林树宽王晓丛乔建忠王蕊

机构地区东北大学信息科学与工程学院

出处《东北大学学报（自然科学版）》 EI CAS CSCD 北大核心 2016年第1期24-28,共5页 Journal of Northeastern University(Natural Science)

基金国家自然科学基金资助项目(61272177)

关键词项集效用索引效用 Top—k高效用项集尾超项集效用矩阵 itemset utility the index utility Top-k high utility itemset ending super itemset utility matrix

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1Han J, Kamber M,Pei J. Data mining : concept and technique[M]. 3rd ed. Beijing:China Machine Press,2012.
2Brin S, Motwani R, Ullman J D, et al. Dynamic itemsetcounting and implication rules for market basket data [ C ] //Proceedings of ACM SIGMOD Conference on Managementof Data. Tucson, 1997:255 -264.
3毛国君,宗东军.基于多维数据流挖掘技术的入侵检测模型与算法[J].计算机研究与发展,2009,46(4):602-609. 被引量：25
4杨欢,张玉清,胡予濮,刘奇旭.基于权限频繁模式挖掘算法的Android恶意应用检测方法[J].通信学报,2013,34(S1):106-115. 被引量：48
5Agrawal R, Srikant R. Fast algorithms for mining associationrules [C]// Proceedings of the 20th VLDB Conference.Santiago de Chile, 1994:487 -499.
6Agrawal R,Imielinski T, Swami A. Mining association rulesbetween sets of items in large databases[ C]// Proceedings ofthe ACM SIGMOD Conference on Management of Data.New York:ACM Press, 1993 :207 -216.
7Pei J,Han J,Lu H,et al. H-Mine:hyper-structure mining offrequent patterns in large databases[C]// IEEE InternationalConference on Data Mining. Piscataway ,2001:441 -448.
8Han J, Pei J. Mining frequent patterns without candidategeneration : a frequent-pattem tree approach [J]. Data Miningand Knowledge Discovery, 2004,8(1) :53 -87.
9Yao H,Hamilton H J, Geng L. A unified framework forutility-based measures for mining itemsets [ C ] // Proceedingsof ACM SIGKDD 2nd Workshop on Utility-Based DataMining. Philadelphia,2006:28 -37.
10Tseng V,Wu C W,Shie B E, et al. UP-growth: an efficientalgorithm for high utility itemset mining [ C ] // Proceedingsof KDD’ 10. Washington DC,2010:253 -263.

二级参考文献14

1郑军,胡铭曾,云晓春,郑仲.基于数据流方法的大规模网络异常发现[J].通信学报,2006,27(2):1-8. 被引量：16
2郭山清,谢立,曾英佩.入侵检测在线规则生成模型[J].计算机学报,2006,29(9):1523-1532. 被引量：14
3刘旭,毛国君,孙岳,刘椿年.数据流中频繁闭项集的近似挖掘算法[J].电子学报,2007,35(5):900-905. 被引量：14
4Lee W, Stolfo S J. Data mining approaches for intrusion detection [C] //Proc of the 7th USENIX Security Symposium. Berkeley, USA: USENIX Assoc, 1998: 79-93
5Lee W, Stolfo S J. A framework for constructing features and models for intrusion detection systems [C] //ACM Trans on Information and System Security. New York: ACM Press, 2000:227-261
6EI-Semary A, Edmonds J, Gonzalez Pino J. Applying data mining of fuzzy association rules to network intrusion detection[C] //Proc of 2006 IEEE Information Assurance Workshop. Piscataway, NJ : IEEE, 2006 : 100-107
7Pornoy L. Intrusion detection with unlabeled data using clustering [C] //Proc of ACM CSS Workshop on Data Mining Applied to Security. New York: ACM, 2001
8Zanero S, Sacaresi S M. Unsupervised learning techniques for an intrusion detection system [C]//Proc of the 2004 ACM Symp on Applied Computing. New York: ACM, 2004
9Oh S, Kang J, Byun Y, et al. Intrusion detection based on clustering a data stream [C] //Proc of the 3rd ACIS Int Conf on Software Engineering Research, Management and Applications(SERA'05). Los Alamitos: IEEE Computer Society, 2005:220-227
10Dong G, Han J, Lakshmanan L, et al. Online mining of changes from data streams: research problems and preliminary results [C] //Proc of the 2003 Workshop on Management and Processing of Data Streams (MPDS2003). New York: ACM, 2003:225-236

共引文献71

1梁卫星.数据库入侵检测技术研究与应用[J].吕梁教育学院学报,2013,30(1):64-65.
2魏小涛,黄厚宽,田盛丰.在线自适应网络异常检测系统模型与算法[J].计算机研究与发展,2010,47(3):485-492. 被引量：10
3冉宏敏,柴胜,冯铁,张家晨.P2P僵尸网络研究[J].计算机应用研究,2010,27(10):3628-3632. 被引量：6
4毛伊敏,杨路明,陈志刚,刘立新.基于数据流挖掘技术的入侵检测模型与算法[J].中南大学学报（自然科学版）,2011,42(9):2720-2728. 被引量：6
5邬书跃,余杰,樊晓平.基于改进SVM协作训练的入侵检测方法[J].计算机应用,2011,31(12):3337-3339. 被引量：2
6许颖梅.基于数据流频繁模式挖掘的入侵检测模型[J].陕西理工学院学报（自然科学版）,2011,27(4):24-29.
7李文忠,左万利,赫枫龄.一种基于信息熵的多维流数据噪声检测算法[J].计算机科学,2012,39(2):191-194. 被引量：4
8邬书跃,余杰,樊晓平.基于Tri-training的入侵检测算法[J].计算机工程,2012,38(6):158-160. 被引量：2
9许颖梅.数据流挖掘算法在网络安全中的应用研究[J].河南科学,2012,30(3):348-351. 被引量：1
10申利民,王红.基于Jini的协同入侵检测模型[J].计算机工程与设计,2012,33(9):3301-3304.

同被引文献14

1宋艳,何嘉,舒红平,郑皎凌,梁繁荣,任玉兰,文立玉.基于文本挖掘词频反文档频率方法的疾病症状权重挖掘研究[J].成都信息工程学院学报,2014,29(1):52-58. 被引量：4
2刘玉坤,夏栋梁,马丽.基于AGSO-LSSVM的热点话题预测模型[J].重庆邮电大学学报（自然科学版）,2014,26(6):803-808. 被引量：5
3张旭东.基于混合数据挖掘方法的入侵检测算法研究[J].信息安全与技术,2015,6(2):31-33. 被引量：18
4孙东亮,周卫平,王家林.输电网故障诊断决策表约简新方法[J].计算机仿真,2015,32(3):153-157. 被引量：6
5段建勇,闫启伟,张梅,胡熠.维基百科中翻译对的模板挖掘方法研究[J].中文信息学报,2015,29(2):190-198. 被引量：2
6徐宇航,皮德常.卫星异常模式挖掘方法[J].小型微型计算机系统,2015,36(9):1988-1992. 被引量：5
7张亮,张玲玲,陈懿冰,腾伟丽.基于信息融合的数据挖掘方法在公司财务预警中的应用[J].中国管理科学,2015,23(10):170-176. 被引量：45
8栾华,周明全,付艳.多核处理器上的频繁图挖掘方法[J].计算机研究与发展,2015,52(12):2844-2856. 被引量：4
9张玉峰,曾奕棠.基于云聚类挖掘的物流信息智能分析方法研究[J].情报资料工作,2016,37(1):42-47. 被引量：3
10叶海琴,廖利,王意锋,张爱玲.一种新的频繁模式挖掘算法[J].南京理工大学学报,2016,40(1):29-34. 被引量：2

引证文献3

1严宇平,吴石松,王建永,张璐.企业移动网站中投资盈利信息检测仿真研究[J].计算机仿真,2017,34(6):423-426.
2何登平,何宗浩.基于R-list的Top-K高效用项集挖掘算法[J].计算机工程与科学,2019,41(7):1318-1324. 被引量：5
3曾毅,张福泉.基于多效用阈值的分布式高效用序列模式挖掘[J].计算机工程与设计,2020,41(2):449-457. 被引量：1

二级引证文献6

1浦蓉,邵剑飞,胡常礼,曲坤.基于优化上界的高平均效用项集垂直挖掘算法[J].计算机工程与科学,2020,42(5):931-937. 被引量：1
2蒋华,路昕宇,王慧娇,宋佳璐.基于DBP的Top-k高效用项集挖掘算法[J].计算机工程与设计,2021,42(6):1631-1637. 被引量：1
3丁斌,袁博,郑焕坤,邢志坤,王帆.面向新型电力系统的电力大数据副本管理算法[J].电测与仪表,2022,59(1):10-17. 被引量：24
4潘燕.空间数据挖掘算法在煤矿能源保护监管中的应用[J].能源与环保,2022,44(6):225-230. 被引量：1
5齐诗仪,倪友聪,章思佳,杜欣,林丽莉,林栋.基于智能选穴模式的针灸“症-穴”相关研究[J].中华中医药杂志,2022,37(12):7220-7223. 被引量：4
6蒋华,李星,王慧娇,韦静海.基于数据索引结构的跨级高效用项集挖掘算法[J].计算机应用,2023,43(7):2200-2208. 被引量：4

1王敬华,罗相洲,吴倩.基于效用表的快速高平均效用挖掘算法[J].计算机应用,2016,36(11):3062-3066. 被引量：5
2宋威,刘宇,李晋宏.基于数据库垂直表示的高效用项集挖掘算法研究[J].北方工业大学学报,2011,23(1):20-24. 被引量：1
3刘畅畅.高效用项集挖掘算法综述[J].福建电脑,2016,32(3):90-92.
4祝孔涛,李兴建,王乐.高效用项集挖掘算法[J].计算机工程与设计,2013,34(12):4220-4225. 被引量：9
5宋威,刘明渊,李晋宏.基于事务型滑动窗口的数据流中高效用项集挖掘算法[J].南京大学学报（自然科学版）,2014,50(4):494-504. 被引量：4
6宋威,吉红蕾,李晋宏.一种高效用项集并行挖掘算法[J].计算机工程与科学,2015,37(3):422-428. 被引量：3
7李慧,刘贵全,瞿春燕.频繁和高效用项集挖掘[J].计算机科学,2015,42(5):82-87. 被引量：4
8汪峰坤,张婷婷.一种改进的高效用频繁集挖掘算法[J].宿州学院学报,2016,31(7):103-105.
9赵国生,王慧强,王健.信息系统可生存性的定量分析模型[J].计算机工程,2008,34(6):41-42.
10王敬华,罗相洲,吴倩.基于投影的高效用项集挖掘算法[J].小型微型计算机系统,2016,37(6):1212-1216. 被引量：5

东北大学学报（自然科学版）

2016年第1期

浏览历史

内容加载中请稍等...

基于索引效用的Top-k高效用项集挖掘方法被引量：3

参考文献14

二级参考文献14

共引文献71

同被引文献14

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于索引效用的Top-k高效用项集挖掘方法 被引量：3

参考文献14

二级参考文献14

共引文献71

同被引文献14

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于索引效用的Top-k高效用项集挖掘方法被引量：3