期刊文献+

多维多层关联规则有效挖掘的新算法 被引量:9

Effectively Mining Multi-dimension Multi-level Association Rules
在线阅读 下载PDF
导出
摘要  提出根据信息熵划分属性值区间或集合、自动生成与人机交互相结合确定层次结构的方法,将多维多层多数据类型问题转化为受约束的一维单层布尔型问题.在此基础上,对直接生成频繁模式的FPT Gen算法进行了扩展,实现了有效挖掘多维多层关联规则的新算法MDML FPT Gen,其效率与可伸缩性均优于经典方法. Association rule discovery plays an important role in data mining. Most of the proposed algorithms are based on Apriori that scans databases as many times as the maximal length of patterns, which results in low efficiency in mining multidimension multilevel rules where the length of patterns over 20 is not uncommon. Moreover, current approaches deal with quantitative attributes by merging adjacent ranges to create simple concept hierarchies, which is too simple to be useful in real applications. To address these problems, a method based on the information entropy to partition quantitative intervals or qualitative values is presented in this paper. The automatic and interactive combined approach for the concept hierarchy formation is proposed. On the basis of that, multidimension multilevel multidatatype association rules can be mined by constrained singledimension singlelevel boolean algorithms. Discussions about FPTGen, an algorithm we proposed recently for mining frequent patterns, are detailed. The design of a new algorithm MDMLFPTGen, derived from FPTGen, is presented. Experimental evaluations show MDMLFPTGen is more efficient and scalable than Aprioribased classical algorithms.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2003年第2期205-210,共6页 Journal of Nanjing University(Natural Science)
基金 浙江省自然科学基金(602140) 浙江省教育厅科研计划(20020635)
关键词 数据挖掘 频繁模式 多维多层关联规则 知识发现 FPT-Gen算法 信息熵 data mining, frequent pattern, multi-dimension multi-level association rules, knowledge discovery
  • 相关文献

参考文献10

  • 1Ganti V, Gehrke J, Ramakrishnan R. Mining very large databases. Computer, 1999, 32(8) : 38-45.
  • 2Han J, Fu Y. Discovery of multiple-level association rules from large databases. IEEE Transactions on Knowledge and Data Engineering, 1999, 11(5) : 798-805.
  • 3Srikant R, Agrawal R. Mining generalized association rules. Umeshwar D, Peter M D G, Shojiro N.Proceedings of the 21st Intonational Conference on Very Large Data Bases. San Francisieo: Morgan Kaufmann Publishers Inc, 1995: 407-419.
  • 4Agrawal R, Srikant R. Fast algorithms for mining association rules. Jorge B B, Matthias J, Carlo Z.Proceedings of the 20th International Conference on Very Large Data Bases. San Francisico: Morgan Kaufmann Publishers Inc, 1994: 487-499.
  • 5Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases.Peter B, Suslail J. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM Press, 1993: 207-216.
  • 6Miller R J, Yang Y. Association rules over interval data. Joan P. Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. ACM Press, 1997: 452-461.
  • 7Srikant R, Agrawal R. Mining quantitative association rules in large relational tables. Jagadish H V,Inderpal S M. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data.ACM Press, 1996: 1-12.
  • 8邹翔,张巍,蔡庆生,王清毅.大型数据库中的高效序列模式增量式更新算法[J].南京大学学报(自然科学版),2003,39(2):165-171. 被引量:10
  • 9Department of Information and Computer Science, University of California at Irvine. UCI machine learning repository, http://www. ics. uci. edu/-mlearn/MLRepository. html, 2000.
  • 10Christian B. Apriori implementation. http://fuzzy. cs. uni-magdeburg. de/-borgelt/src/apriori.exe,2000.

二级参考文献15

  • 1Agrawal R, Srikant R. Mining sequential patterns. Proceedings of the International Conference on Data Engineering. IEEE Computer Society, 1995: 3-14.
  • 2Agrawal R, Srikant R. Mining sequential patterns: Generalizations and performance improvements.Proceeding of the International Conference on Extending Database Technology. New York: Springer-Verlag, 1996: 3-17.
  • 3Bettini C, Sean Wang X, Jajodia S. Mining temporal relationships with multiple granularities in time sequences. Data Engineering Bulletin, 1998, 21: 32-38.
  • 4Ozden B, Ramaswamy S, Silberschatz A. Cyclic association rules. Proceedings of the International Conference on Data Engineering. IEEE Press, 1998: 412-421.
  • 5Garofalakis M, Rastogi R, Shim K. Spirit: Sequential pattern mining with regular expression constraints.Proceedings of the International Conference on Very Large DataBases. San Franciso: Morgan Kaufmann Publishers Inc, 1999: 223-234.
  • 6Han J, Pei J, Mortazavi-Asl B, et al. Freespan: Frequent pattern-projected sequential pattern mining.Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, 2000:355-359.
  • 7Han J, Pei J, Mortazavi-Asl B, et al. PrefixSpan: Mining sequential patterns effieiently by prefix-projected pattern growth. Proceedings of the International Conference on Data Engineering. IEEE Press,2001 : 215-226.
  • 8Cheung D W, Han J, Ng V T, et al. Maintenance of discovered association rules: An incremental update technique. Proceedings of the 12th International Conference on Data Engineering. IEEE Press, 1996:106-114.
  • 9Cheung D W, Lee S D, Kao B. A general incremental technique for maintaining discovered associationrules. Proceedings of the Fifth International Conference on Database Systems for Advanced Applications.Singapore: World Scientific Publishing, 1997: 185-194.
  • 10Wang K. Discovering patterns from large and dynamic sequential data. Journal of Intelligent Information System, 1997: 8-33.

共引文献9

同被引文献90

引证文献9

二级引证文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部