基于贝叶斯网络的频繁模式兴趣度计算及剪枝被引量：4

Computing and Pruning Method for Frequent Pattern Interestingness Based on Bayesian Networks

下载PDF

导出

摘要采用贝叶斯网络表示领域知识,提出一种基于领域知识的频繁项集和频繁属性集的兴趣度计算和剪枝方法 BN-EJTR,其目的在于发现与当前领域知识不一致的知识,以解决频繁模式挖掘所面临的有趣性和冗余问题.针对兴趣度计算过程中批量推理的需求,BN-EJTR提供了一种基于扩展邻接树消元的贝叶斯网络推理算法,用于计算大量项集在贝叶斯网络中的支持度;同时,BN-EJTR提供了一种基于兴趣度阈值和拓扑有趣性的剪枝算法.实验结果表明,与同类方法相比,方法 BN-EJTR具有良好的时间性能,而且剪枝效果明显;分析发现,经过剪枝后的频繁属性集和频繁项集相对于领域知识符合有趣性要求. Based on background knowledge represented as a Bayesian network, this paper presents a BN-EJTR method that computes the interestingness of frequent items and frequent attributes, and prunes. BN-EJTR seeks to find inconsistent knowledge relative to background knowledge and to resolve the problems of un-interestingness and redundancy faced by frequent pattern mining. To deal with the demand of batch reasoning in Bayesian networks during computing interestingness, BN-EJTR provides a reasoning algorithm based on extended junction tree elimination for computing the support of a large number of items in a Bayesian network. In addition, BN-EJTR is equipped with a pruning mechanism based on a threshold for topological interestingness. Experimental results demonstrate that BN-EJTR has a good time performance compared with the same classified methods, and BN-EJTR also has effective pruning results. The analysis indicates that both the pruned frequent attributes and the pruned frequent items are un-interesting in respect to background knowledge.

作者胡春玲吴信东胡学钢姚宏亮

机构地区合肥工业大学计算机与信息学院合肥学院网络与智能信息处理重点实验室

出处《软件学报》 EI CSCD 北大核心 2011年第12期2934-2950,共17页 Journal of Software

基金国家自然科学基金(60828005 60975034 61070131)

关键词频繁模式贝叶斯网络邻接树兴趣度剪枝 frequent pattern Bayesian network junction tree interestingness pruning

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献21

1Cheng H, Yan XF, Han JW, Hsu CW. Discriminative frequent pattern analysis for effective classification. In: Proc. of the 23rd Int'l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society Press, 2007. 716-725. [doi: 10.1109/ICDE.2007.367917].
2Ohsaki M, Kitaguchi S, Okamoto K, Yokoi H, Yamaguchi T. Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D, eds. Proc. of the 8th European Conf. on Principles of Data Mining and Knowledge Discovery. LNCS 3202, Heidelberg: Springer-Verlag, 2004. 362-373. [doi: 10.1007/978-3-540-30116-5_34].
3黄名选,严小卫,张师超.基于矩阵加权关联规则挖掘的伪相关反馈查询扩展.软件学报,2009,20(7):1854-1865.http://www.jos.org.cn/1000-9825/3368.htm [doi:10.3724/SP.J.1001.2009.03368].
4Jaroszewicz S, Scheffer T. Fast discovery of unexpected patterns in data relative to a Bayesian network. In: Grossman R, Bayardo ILl, Bennett KP, eds. Proc. of the 1 lth ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2005. 118-127. [doi: 10.1145/1081870.1081887].
5Padmanabhan B, Tuzhilin A. Small is beautiful: discovering the minimal set of unexpected patterns. In: Proc. of the 6th ACM- SIGKDD Int'l Conf. on Knowledge Discovery" and Data Mining. New York: ACM Press, 2000. 54-63. [doi: 10.1145/347090. 347103].
6Jensen FV, Nielsen TD. Bayesian Networks and Decision Graphs. 2nd ed., New York: Springer-Verleg, 2007. 109-166. [doi: 10.1007/s00362-009-0201-4].
7胡学钢,胡春玲.一种基于依赖分析的贝叶斯网络结构学习算法[J].模式识别与人工智能,2006,19(4):445-449. 被引量：10
8Zhang H, Padmanabhan B, Tuzhilin A. On the discovery of significant statistical quantitative rules. In: Kim W, Kohavi R, Gehrke J, DuMouchel W, eds. Proc. of the 10th ACM-SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2004.374-383. [doi: 10.1145/1014052.1014094].
9Blanchard J, Guillet F, Gras R, Briand H. Using information-theoretic measures to assess association rule interestingness. In: Proc. of the 5th Int'l Conf. on Data Mining. Washington: IEEE Computer Society, 2005.66-73. [doi: 10.1109/ICDM.2005.149].
10Cooper GF. The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 1990, 42(2-3):393-405. Idol: 10.1016/0004-3702(90)90060-D].

二级参考文献14

1Chickering D M, Herkerman D, Meek C. Large-Sample Learning of Bayesian Networks is NP-Hard. Journal of Machine Learning Research, 2004, 5 : 1287-1330
2Cooper G F, Herskovits E. A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning,1992, 9(4): 309-347
3Cheng J, Greiner R, Kelly J. Learning Bayesian Networks from Data: An Efficient Information- Theory Based Approach. Artificial Intelligence, 2002, 137( 1 --2) : 43--90
4Verma T, Pearl J. An Algorithm for Deciding if a Set of Observed Independencies Has a Causal Explanation. In: Dubois D,Wellman M P, et al, eds. Proc of the 8th Conference on Uncertainty in Artificial Intelligence. Stanford, USA: Morgan Kaufmann, 1992, 323--330
5Sprites P, Glymour C, Scheines R. Causality from Probability.In: Mckee G, ed. Evolving Knowledge in Natural and Artificial Intelligence. London, UK:Pitman, 1990,181-- 199
6SpritesP, GlymourC, ScheinesR. An Algorithm for Fast Recovery of Sparse Causal Graphs. Social Science Computer Review,1991, 9(1): 62--72
7Peng H C, Ding C. Structure Search and Stability Enhancement of Bayesian Networks. In: Proc of the 3rd IEEE International Conference on Data Mining. Melbourne, USA, 2003, 621--624
8姚宏亮王浩胡学钢汪荣贵.基于遗传算法和MDL原则的贝叶斯网络结构优化算法[J].南京大学学报：自然科学版,2002,38(2):23-27.
9Pearl J. Probabilistic Reasoning in Intelligente Systems: Networks of Plausible Inference. San Mateo, USA: Morgan Kaufmann, 1988
10Wong M L, Leung K S. An Efficient Data Mining Method for Learning Bayesian Networks Using an Evolutionary Algorithm-Based Hybrid Approach. IEEE Trans on Evolutionary Computation, 2004, 8(4): 378-404

共引文献78

1胡春玲,胡学钢.一种具有缺失数据的贝叶斯网络结构学习方法[J].合肥工业大学学报（自然科学版）,2007,30(4):449-453. 被引量：5
2滕丽华,杨季芳.基于贝叶斯网的象山港网箱养殖水环境指标建模[J].海洋湖沼通报,2009(1):135-140. 被引量：1
3姚冬磊,赵晓鹏,卫耀伟.同义词挖掘及表示研究[J].福建电脑,2010,26(3):44-44.
4姚冬磊,赵晓鹏,卫耀伟.面向信息检索的量化本体学习[J].软件导刊,2010(8):42-43.
5刘建荣,翟雪莱,赵晓鹏.本体概念自动获取研究[J].软件导刊,2010,9(9):14-15.
6支凤麟,徐炜民.基于主题的个性化查询扩展模型[J].计算机工程与设计,2010,31(20):4471-4475. 被引量：5
7冯平,黄名选.特征词抽取和相关性融合的伪相关反馈查询扩展[J].现代图书情报技术,2011(1):52-56. 被引量：6
8吴越,周安民,丁雪峰,胡勇.运用查询扩展技术的网民言论与舆论话题相关性研究[J].计算机应用研究,2011,28(3):1145-1147.
9武玉刚,秦勇,宋继光,杨忠明.基于关联规则的入侵检测算法研究综述[J].计算机工程与设计,2011,32(3):834-838. 被引量：7
10吕桃霞,刘培玉.一种基于矩阵的强关联规则生成算法[J].计算机应用研究,2011,28(4):1301-1303. 被引量：17

同被引文献25

1梁开健,梁泉,杨炳儒.关联规则挖掘中阈值协调器的设计与实现[J].系统工程与电子技术,2005,27(10):1800-1802. 被引量：3
2马建庆,钟亦平,张世永.基于兴趣度的关联规则挖掘算法[J].计算机工程,2006,32(17):121-122. 被引量：20
3Han J,Kamber M.数据挖掘概念与技术[M].范明,译.北京:机械工业出版社,2007:32-59.
4AGRAWAL R,IMIELINSKI T,SWAMI A.Mining association rules between sets of items in large databases[C] // SIGMOD '93:Proceedings of the 1993 ACM SIGMOD Conference on Management of Data.New York:ACM,1993:207-216.
5SRIKANT R, AGRAWAL R. Mining generalized association rules[C] // VLDB '95: Proceedings of the 21st International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann, 1995: 407-419.
6SAVASERE A, OMIECINSKI E, NAVATHE S. Mining for strong negative associations in a large database of customer transactions[C] // Proceedings of the 14th International Conference on Data Engineering. Washington, DC: IEEE Computer Society, 1998: 494-502.
7JALALVAND A, MINAEI B, ATABAKI G, et al. A new interestingness measure for associative rules based on the geometric context[C] //ICCIT '08: Proceedings of the 2008 Third International Conference on Convergence and Hybrid Information Technology. Washington, DC: IEEE Computer Society, 2008: 199-203.
8李裕奇, 赵联文,王沁,等. 非参数统计方法[M].成都: 西南交通大学出版社,2010: 116-119.
9MICHAEL J A, GORDON S L. 数据挖掘技术:市场营销销售与客户关系管理领域应用[M]. 2版. 别荣芳,尹静,邓六爱,译. 北京:机械工业出版社,2006.
10曾安平,黄永平,李广军,阳万安,唐自力.基于协方差的FP-Growth算法在ERP中的研究与应用[J].数学的实践与认识,2008,38(12):11-18. 被引量：2

引证文献4

1曾安平.多类关联规则生成算法[J].计算机应用,2012,32(8):2198-2201. 被引量：2
2Weiyi LIU,Kun YUE,Hui LIU,Ping ZHANG,Suiye LIU,Qianyi WANG.Associative categorization of frequent patterns based on the probabilistic graphical model[J].Frontiers of Computer Science,2014,8(2):265-278. 被引量：1
3朱书眉,王诚.基于最大频繁项集的图像分类技术[J].计算机工程与应用,2016,52(23):181-184.
4张小可,沈文明,杜翠凤.贝叶斯网络在用户画像构建中的研究[J].移动通信,2016,40(22):22-26. 被引量：46

二级引证文献49

1李晓光.以用户为导向的公共图书馆服务需求侧管理模式研究[J].图书馆建设,2019,0(S01):55-58. 被引量：6
2李望月,刘瑾,陈娜.大数据技术在乡村画像中的应用研究[J].大数据,2020,6(1):99-118. 被引量：6
3张瑞云.基于关联规则模式的数字图书馆智能检索研究[J].电子技术（上海）,2013(9):73-75. 被引量：2
4郑晓峰,王曙.基于粗糙集与关联规则的道路运输管理信息数据挖掘方法[J].华南理工大学学报（自然科学版）,2014,42(2):132-138. 被引量：12
5曹树金,刘慧云.以读者为中心的智慧图书馆研究[J].图书情报工作,2019,63(1):23-29. 被引量：62
6刘速.浅议数字图书馆知识发现系统中的用户画像——以天津图书馆为例[J].图书馆理论与实践,2017,0(6):103-106. 被引量：106
7席岩,张乃光,王磊,张智军,刘海涛.基于大数据的用户画像方法研究综述[J].广播电视信息,2017,0(10):37-41. 被引量：24
8徐璐瑶,姜增祺,黄婷婷,刘云鹏.基于大数据的用户画像系统概述[J].电子世界,2018,0(2):64-65. 被引量：8
9王庆,赵发珍.基于“用户画像”的图书馆资源推荐模式设计与分析[J].现代情报,2018,38(3):105-109. 被引量：149
10单晓红,张晓月,刘晓燕.基于在线评论的用户画像研究——以携程酒店为例[J].情报理论与实践,2018,41(4):99-104. 被引量：112

1刘先锋,郭林沅.一种提高遗传算法子图挖掘效率的数据结构[J].计算机工程,2016,42(11):207-212.
2胡春玲,胡学钢,姚宏亮.改进的基于邻接树的贝叶斯网络推理算法[J].模式识别与人工智能,2011,24(6):846-855. 被引量：6
3程继华,魏暑生,施鹏飞.基于概念的关联规则的挖掘[J].郑州大学学报（自然科学版）,1998,30(2):27-30. 被引量：3
4郝海涛,马元元.应用Aprion算法实现大规模数据库关联规则挖掘的技术研究[J].现代电子技术,2016,39(7):124-126. 被引量：15
5方便快捷轻松玩转Windows7快捷键[J].计算机与网络,2011,37(17):30-30.
6娄兰芳,蒋志方,王乐强.数据挖掘中关联规则的有趣性研究[J].现代计算机,2002,8(10):10-13. 被引量：1
7杨光军.关联规则的衡量标准研究[J].福建电脑,2010,26(11):56-57. 被引量：1
8赵军民,王军豪,高蔚.数据挖掘中关联规则衡量方法的改进[J].河南城建学院学报,2010,19(6):63-66.
9程继华,郭建生,施鹏飞.元规则指导的知识发现方法研究[J].计算机工程与应用,1999,35(10):34-36. 被引量：4
10覃俊,康立山,陈毓屏.用于分类规则提取的演化算法分析与设计[J].计算机工程与应用,2004,40(2):13-15. 被引量：6

软件学报

2011年第12期

浏览历史

内容加载中请稍等...

基于贝叶斯网络的频繁模式兴趣度计算及剪枝被引量：4

参考文献21

二级参考文献14

共引文献78

同被引文献25

引证文献4

二级引证文献49

相关作者

相关机构

相关主题

浏览历史

基于贝叶斯网络的频繁模式兴趣度计算及剪枝 被引量：4

参考文献21

二级参考文献14

共引文献78

同被引文献25

引证文献4

二级引证文献49

相关作者

相关机构

相关主题

浏览历史

基于贝叶斯网络的频繁模式兴趣度计算及剪枝被引量：4