期刊文献+

基于clustering算法的事务抽样关联规则挖掘算法 被引量:3

Association rules mining on subset of raw data based on clustering
在线阅读 下载PDF
导出
摘要 关联规则挖掘典型算法Apriori由于在频繁项集的生成时,需要多次扫描数据库,空间和时间耗费较大。之后虽然有很多Apriori算法的改进版本,但大多是从数据存储结构的角度,少有研究考虑到数据集本身的性质。对此提出了基于clustering算法的事务抽样关联规则挖掘算法,通过聚类技术对事务进行聚类,得出能够反映原始交易数据特征的事务子集,然后,在该子集上开展挖掘分析工作。该方法在8个不同规模人造数据集和1个真实数据集上进行了实验。其中,在较小规模人造数据集上,时间比原方法节省0.03 s;规模越大,节省时间越多,在大小为15 000、维度为30的数据集上运行时,比原方法节省了70 s;在真实数据集上,不同参数设置下该方法耗时仅为原方法的50%。实验证明,该方法与传统Apriori算法相比,效率较高,尤其在数据量大时,效果提升更明显。该算法的思想也可以扩展应用到其他改进的Apriori算法中。 Association rule mining is an important research branch of data mining. Its typical algorithm Apriori faces a serious problem that it needs to scan dataset many times and consumes much time and memory. Especially,when both data size and dimension are very large,it is perhaps not tolerable. With the coming of the big data time,finding frequent itemsets is more and more difficult. To solve this problem,the authors proposed a new method based on clustering and typical Apriori algorithm. It first found a representative subset of raw data set by clustering algorithm,and then mined and analyzed the subset. Experiments were carried out on 8 toy data sets with different sizes and a real data set about game properties transaction. For toy data,this method reduced running time 0. 03 seconds and 70 seconds,on the data set which size is 200*10 and 15 000 * 20 respectively. For the real data set,consumed time of this method is only a half of the old method.Experimental results demonstrate the effectiveness of the method.
作者 马玉玲
出处 《计算机应用》 CSCD 北大核心 2015年第A02期77-79,84,共4页 journal of Computer Applications
基金 山东省高等学校科技计划项目(J15LN55) 山东省职业教育与成人教育科研规划课题(2014zcj015) 山东省教改课题(YCXY-X2014011)
关键词 聚类算法 事务子集 关联规则挖掘 APRIORI算法 clustering algorithm transaction subset association rule mining Apriori algorithm
  • 相关文献

参考文献15

  • 1SHEYDAEI N, SARAEE M, SHAHGHOLIAN A. A novel feature se- lection method for text classification using association rules and cluste- ring[ J]. Journal of Information Science, 2015, 41(1) : 3 - 15.
  • 2RANI B K, GOVARDHAN A. DC (Drought Classifier): forecasting and classification of drought using association rules[ C]// Proceedings of the 3rd International Conference on Frontiers in Imelligent Compu- ting-Theory and Applications. Berlin: Springer, 2015:123 - 130.
  • 3NAULAERTS S, MEYSMAN P, BITTREMIEUX W, et al. A prim- er to frequent itemset mining for bioinformatics [ J]. Briefings in Bioinformatics, 2015, 16(2) :216 -318.
  • 4曲守宁,董彩云,徐德军,吴桐.关联规则算法研究及其在教学系统中的应用[J].计算机系统应用,2005,14(4):20-23. 被引量:5
  • 5韩家炜.数据挖掘-概念与技术[M].2版.北京:机械工业出版社,2006:230-239.
  • 6ORDONEZ C. A model for association rules based on clustering [ C]// SAC'05: Proceedings of the 2005 ACM Symposium on Ap- plied Computing. New York: ACM, 2005:545 -546.
  • 7LENT B, SWAMI A, WIDOM J. Clustering association rules[ C]// Proceedings of the 13th International Conference on Data Engineer-ing. Piscataway: IEEE, 1997:220-231.
  • 8WANG K, XU C, LIU B. Clustering transactions using large items[C]// CIKM'99: Proceedings of the 8th International Conference on Information and Knowledge Management. New York: ACM, 1999:483 -490.
  • 9KOH Y S, PEARS R. Transaction clustering using a seeds based approach[ C]//Proceedings of the 12th Pacific-Asia Conference Ad- vances in Knowledge Discovery and Data Mining, LNCS 5012. Ber- lin: Springer, 2008:916 -922.
  • 10YUN S, PEARS K R. Rare association rule mining via transaction clustering[ C]// Proceedings of the 7th Australasian Data Mining Conference, Conferences in Research and Practive in Information Technology. Glenelg, Australia: Australian Computer Society, 2008, 101:69 -74.

二级参考文献33

  • 1沈国强,覃征.一种新的多维关联规则挖掘算法[J].小型微型计算机系统,2006,27(2):291-294. 被引量:18
  • 2屠莉,陈崚.挖掘关联规则的蚁群算法[J].南京邮电大学学报(自然科学版),2006,26(5):36-40. 被引量:5
  • 3贾彩燕 倪现君.关联规则挖掘研究述评[J].计算机科学,2003,30(4):145-148.
  • 4Agrawal R, Imielinski T, Swami A. Mining Association Rules between Sets of Ltems in Large Database[ M].In SIGMOD" 93, Washington, DC, May 1993. 207 -216.
  • 5范明 孟小峰译.数据挖掘概念与技术[M].机械工业出版社,2003,3.150-221.
  • 6Jiawei Han,Micheline K.数据挖掘:概念与技术[M].范明,等译.北京:机械工业出版社,2006.
  • 7AGRAWAL R, IMIELINSKI T, SWAMI A. Mining association rules between sets of items in large databases [ J ]. AGM SIGMOD Re- cord,1993,22(2) :207-216.
  • 8KAMSU-FOGUEM B, RIGAL F, MAUGET F. Mining association rules for the quality improvement of the production process [ J ]. Ex- pert Systems with Applications,2013,40 (4) :1034-1045.
  • 9QODMANAN H R, NASIRI M, MINAEI-BIDGOLI B. Multi objec- tive association rule mining with genetic algorithm without specifying minimum support and minimum cmffidence [ J ]. Expert Systems with Applications ,2011,38( 1 ) :288-298.
  • 10ZAKI M J. Mining non-redundant association rules[ J]. Data Mining and Knowledge Discovery, 2004,9 ( 3 ) : 223 - 248.

共引文献13

同被引文献23

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部