摘要
最大频繁项目序列集的生成是影响关联规则挖掘的关键问题 ,传统的算法是通过对事务数据库的多次扫描实现的 .最新的研究已经开始通过减少事务数据库的扫描次数进而减少挖掘过程的 I/ O代价来获得更高的效率 .随着计算机性能的提高 ,探索合适的数据结构来支持基于一次事务数据库扫描的高效算法成为可能 .该文首先给出项目序列集和它的基本操作的严格定义 ,然后在此基础上提出一个称为 ISS- DM的最大频繁项目序列集生成算法 .ISS- DM算法是通过对事务数据库的一次扫描而逐步演化成最大频繁项目序列集的 .
Discovering the frequent set of item sequences in a transaction database is one of the most important tasks in mining association rules. Many algorithms have been proposed in the literatures, but most of them are based on Apriori method: pruning the itemset lattice, which need iterations to the transaction database. Recent algorithms attempted to improve the mining efficiency by reducing the number of database passes to control I/O cost. In this paper, we first define Set of Item Sequences and its basic properties, then create some operators which aim at the mining of association rules. Let ISS 1 and ISS 2 be the two variables of set of item sequences, and IS be a variable of item sequence, then the main operators are defined as follows: (1) IS ∈ sub ISS 1 IS 1 ∈ ISS 1 , have IS IS 1 ;(2) ISS 1 sub ISS 2 IS 1 ∈ ISS 1 , have IS 1 ∈ sub ISS 2 ;(3) ISS 1 ∩ sub ISS 2 ={ IS |IS∈ sub ISS 1 and IS ∈ sub ISS 2 };(4) ISS 1 ∪ sub ISS 2 ={ IS|IS ∈ sub ISS 1 or IS ∈ sub ISS 2 }. Based on these definitions, we propose a new efficient algorithm called ISS DM which can avoid repeatedly scanning the transaction database for mining association rules. Unlike existing algorithms which are based on the pruning the itemset lattice or its improved methods, our algorithm only makes use of the two linear data structures in the memory( ISS and ISS * ), and it can obtain higher mining efficiency with less storage than other algorithms in some cases. Finally the effectiveness of this algorithm is analyzed and some experimental results are given. The experiments show that ISS DM algorithm is efficient in transaction databases of moderate size, and for some particular large databases.
出处
《计算机学报》
EI
CSCD
北大核心
2002年第4期417-422,共6页
Chinese Journal of Computers
基金
国家自然科学基金 (60 173 0 14 )
北京市自然科学基金(4 0 2 2 0 0 3 )
北京市教委资金资助