期刊文献+

基于项目序列集操作的关联规则挖掘算法 被引量:37

Mining of Association Rules Based on the Operators of Set of Item Sequences
在线阅读 下载PDF
导出
摘要 最大频繁项目序列集的生成是影响关联规则挖掘的关键问题 ,传统的算法是通过对事务数据库的多次扫描实现的 .最新的研究已经开始通过减少事务数据库的扫描次数进而减少挖掘过程的 I/ O代价来获得更高的效率 .随着计算机性能的提高 ,探索合适的数据结构来支持基于一次事务数据库扫描的高效算法成为可能 .该文首先给出项目序列集和它的基本操作的严格定义 ,然后在此基础上提出一个称为 ISS- DM的最大频繁项目序列集生成算法 .ISS- DM算法是通过对事务数据库的一次扫描而逐步演化成最大频繁项目序列集的 . Discovering the frequent set of item sequences in a transaction database is one of the most important tasks in mining association rules. Many algorithms have been proposed in the literatures, but most of them are based on Apriori method: pruning the itemset lattice, which need iterations to the transaction database. Recent algorithms attempted to improve the mining efficiency by reducing the number of database passes to control I/O cost. In this paper, we first define Set of Item Sequences and its basic properties, then create some operators which aim at the mining of association rules. Let ISS 1 and ISS 2 be the two variables of set of item sequences, and IS be a variable of item sequence, then the main operators are defined as follows: (1) IS ∈ sub ISS 1  IS 1 ∈ ISS 1 , have IS  IS 1 ;(2) ISS 1  sub ISS 2  IS 1 ∈ ISS 1 , have IS 1 ∈ sub ISS 2 ;(3) ISS 1 ∩ sub ISS 2 ={ IS |IS∈ sub ISS 1 and IS ∈ sub ISS 2 };(4) ISS 1 ∪ sub ISS 2 ={ IS|IS ∈ sub ISS 1 or IS ∈ sub ISS 2 }. Based on these definitions, we propose a new efficient algorithm called ISS DM which can avoid repeatedly scanning the transaction database for mining association rules. Unlike existing algorithms which are based on the pruning the itemset lattice or its improved methods, our algorithm only makes use of the two linear data structures in the memory( ISS and ISS * ), and it can obtain higher mining efficiency with less storage than other algorithms in some cases. Finally the effectiveness of this algorithm is analyzed and some experimental results are given. The experiments show that ISS DM algorithm is efficient in transaction databases of moderate size, and for some particular large databases.
出处 《计算机学报》 EI CSCD 北大核心 2002年第4期417-422,共6页 Chinese Journal of Computers
基金 国家自然科学基金 (60 173 0 14 ) 北京市自然科学基金(4 0 2 2 0 0 3 ) 北京市教委资金资助
关键词 数据挖掘 关联规则 项目序列集 频繁项目序列集 算法 数据库 data mining,association rule, set of item sequences,frequent set of item sequences
  • 相关文献

参考文献1

二级参考文献7

  • 11,Agrawal R, Mannila H, Srikant R et al. Fast discovery of association rules. In: Fayyad M, Piatetsky-Shapiro G, Smyth P eds. Advances in Knowledge Discovery and Data Mining. Menlo Park, California: AAAI/MIT Press, 1996. 307-328
  • 22,Brin S, Motwani R, Ullman J D et al. Dynamic itemset counting and implication rules for market basket data. In: Proc the ACM SIGMOD International Conference on Management of Data, Tucson, Arizon, 1997. 255-264
  • 33,Fayyad U M, Piatesky-shapiro G, Smyth P P. From data mining to knowledge discovery: an overview. In: Fayyad M, Piatetsky-Shapiro G, Smyth P eds. Advances in Knowledge Discovery and Data Mining. California:AAAI Press, 1996. 1-36
  • 44,Piatesket-Shapiro G. Discovery, analysis, and presentation of strong rules. In: Piatesky-Shapiro G, Frawley W J eds. Advances in Knowledge Discovery and Data Mining. Menlo Park, California:AAAI/MIT Press, 1991. 229-238
  • 55,Silberschatz A, Stonebraker M, Ullman J. What makes patterns interesting in knowledge discovery sysstems. IEEE Trans on Knowledge and Data Engineering, 1996, 8(6):970-974
  • 66,Symth P, Goodman R M. An information theoretic approach to rule induction from databases. IEEE Trans on Knowledge and Data Engineering, 1992, 4(4):301-316
  • 77,Toivonen H, Klemettinen M, Ronkainen P et al. Pruning and grouping discovered association rules. In: Mlnet Workshop on Statistics, Machine Learning, and Discovery in Database, Gete, Greece, 1995. 47-52

共引文献21

同被引文献244

引证文献37

二级引证文献112

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部