期刊文献+

基于垂直格式的频繁项集挖掘分段算法 被引量:3

Frequent Itemsets Mining Segmentation Algorithm Based on Vertical Format
在线阅读 下载PDF
导出
摘要 针对Eclat算法连接和剪枝操作耗时的缺点,按照项集之间的可连接性,将数据集划分为等价类并分段存储,采用末项剪枝策略,在常量时间内完成连接和剪枝操作.针对Eclat算法求长集合的交集操作需要大量计算的缺点,采用多维数组分段存储项集的事务集,将长集合的求交集操作转换为分段求短集合的交集,并提出期望支持度的概念,在求交集的过程中预测支持度,从而减少求交集的比较次数.实验结果表明,该算法在时间性能方面优于Eclat算法,尤其适用于挖掘长模式稀疏数据集. In view of shortage of time-consuming of connection and pruning step for Eclat algorithm,a method is proposed to divide the data set into equivalence classes with segmented storage according to the connectivity between itemsets.Using the end item pruning strategy,the connection and pruning step will be completed in constant time.In view of shortage of computation of the intersection operation of long sets for Eclat algorithm,a method is proposed to store the transaction sets of itemsets segment by multidimensional array,convert the computation of intersection operation of long sets into short sets in segment,and the concept of the expected support is proposed.It can be forecasted in the process of calculating intersection,so the times of comparing will be reduced.The experimental results show that the algorithm is superior to Eclat algorithm in time performance,and it is suitable for mining long patterns sparse data sets especially.
出处 《吉林大学学报(理学版)》 CAS CSCD 北大核心 2016年第3期553-560,共8页 Journal of Jilin University:Science Edition
基金 国家自然科学基金(批准号:61133011) 吉林省教育厅"十二五"科学技术研究项目(批准号:2013431)
关键词 频繁项集 垂直格式 分段存储 期望支持度 frequent itemset vertical format segmented storage expected support
  • 相关文献

参考文献13

  • 1SchoenbergVM,KukeyeK.大数据时代[M].盛杨燕,周涛,译.杭州:浙江人民出版社,2013.
  • 2Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules [C]//Proceeding of the 20th VLDB Conference. San Francisco: Morgan Kaufmann, 1994: 487-499.
  • 3HAN Jiawei, PEI Jian, YIN Yiwen. Mining Frequent Patterns without Candidate Generation [C]//Proceeding of the 2000 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2000: 1-12.
  • 4Zaki M J. Scalable Algorithms for Association Mining [J]. IEEE Transaction on Knowledge and Data Engineering, 2000, 12(3): 372-390.
  • 5Zaki M J. Fast Vertical Mining Using Diffsets [R]. New York: Rensselaer Polytechnic Institute, 2001.
  • 6Shenoy P, Haritsa J R, Sudarshan S, et al. Turbo-Charging Vertical Mining of Large Databases [C]//Proceeding of the 2000 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2000: 22-33.
  • 7Burdick D, Calimlim M, Gehrke J. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases [C]//Proceedings 17th International Conference on Data Engineering. Piscataway, NJ: IEEE, 2001: 443-452.
  • 8Ayres J, Flannick J, Gehrke J E, et al. Sequential Pattern Mining Using Bitmaps Representation [C]// Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2002: 429-435.
  • 9Poovammal E, Ponnavaikko M. Utility Independent Privacy Preserving Data Mining on Vertically Partitioned Data [J]. Journal of Computer Science, 2009, 5(9): 666-673.
  • 10冯培恩,刘屿,邱清盈,李立新.提高Eclat算法效率的策略[J].浙江大学学报(工学版),2013,47(2):223-230. 被引量:13

二级参考文献28

  • 1Jia-WeiHan,JianPei,Xi-FengYan.From Sequential Pattern Mining to Structured Pattern Mining: A Pattern-Growth Approach[J].Journal of Computer Science & Technology,2004,19(3):257-279. 被引量:18
  • 2李敏,李春平.频繁模式挖掘算法分析和比较[J].计算机应用,2005,25(B12):166-171. 被引量:11
  • 3丁艳辉,王洪国,高明,谷建军.一种基于矩阵的关联规则挖掘新算法[J].计算机科学,2006,33(4):188-189. 被引量:13
  • 4Agrawa R, Imielinski T, Swami A. Mining association rules between sets of items in large databases[C].//Proc, of ACM SIGMOD International Conference on Management of Date. Washington DC,1993 : 207-216.
  • 5Park J S, Ming-Syan C, Philip S Y. An Effective Hash Based Algorithm for Mining Association Rules[C].// Proc of ACMSIGMOD. 1995 : 175-185.
  • 6Brin S, Motwai R, Ullman J D, et al. Dynamic Itemset Counting and Implication Rules for Market BasketData [C].//Proc. of ACM SIGMOD Conference on Management of Data. 1997:265-276.
  • 7Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules in Large Databaes[C].//Proc. of 1994 International Conference on Very Large Databases. 1994:487-499.
  • 8Savasere S, Omiecinski E, Navathe S. An Efficient Algorithm for Mining Association Rules in Large Databases[C].//Proc. of 21^St VLDB. 1995 : 432-444.
  • 9Dunkel B, Soparkar N. Data Organization and Access for Efficient Data Mining[C].//Proc. of 15th IEEE Intl. Conf. on Data Engineering. 1999 : 522-529.
  • 10Han J, Fu Y J. Mining Multiple-Level Association Rules in Large Database[J]. IEEE Trans. on Knowledge and Data Engineering, 1999,11(5) : 798-805.

共引文献50

同被引文献30

引证文献3

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部