摘要
数据挖掘的一个基本任务是在海量数据的数据库中开采频繁项目集。本文提出了一种方法,不用开采频繁项目集全集,而是开采它的一个称为频繁无规则集集合的精简集。我们能用频繁无规则集集合还原出完整的频繁项目集集合和它们的精确支持度而不用读取数据库。可以看到,对频繁无规则集集合的开采是高效的。我们给出了一个算法HOPEIII来开采频繁无规则集集合,并将它和算法AClose进行了比较。实验结果显示,HOPEIII在任何情况下都比AClose的性能更好。
Given a large collection of transactions containing items, a basic common problem is to extract the so-called frequent itemset. The idea presented in this paper is to extract a condensed representation of the frequent itemsets called rule-free sets, instead of extracting the whole frequent itemset collection. We show that this condensed representation can be used to regenerate all frequent patterns and their exact frequencies without any access to the original data. An algorithm named HOPE-Ⅲ is given to extract the frequent rule-free sets. We compared it with an algorithm named A-Close which extracts another condensed representation of frequent itemsets previously investigated in the literature called frequent closed sets. The experiments show that in all cases, HOPE-Ⅲ is much more efficient than A-Close.
出处
《计算机工程与科学》
CSCD
2005年第9期62-63,共2页
Computer Engineering & Science
基金
十五国家科技攻关计划资助项目(2001BA102A040203)
关键词
数据挖掘
精简集
频繁项目集
无规则集
data mining
condensed representation
frequent itemset
rule-free set