摘要
通过对一些著名的闭合频繁集挖掘算法(如CLOSET+,FP-CLOSE,DCI-CLOSED和LCMv2等)的研究并结合挖掘理论分析,提出了一种新的挖掘算法Cherry,它基于FP-tree结构,并采用了新颖的CherryItem检测技术,无须在内存中保留闭合频繁集而直接检测出会导致重复的频繁项前缀,从而极大地提高了挖掘效率.性能实验的比较和测试表明,该Cherry算法在低支持度的测试中要优于目前的一些主流挖掘算法,如LCMv2,DCI-CLOSE和FP-CLOSE等.
Through the theoretical analysis and research works on some famous mining algorithms, a new mining algorithm named Cherry is proposed in this paper. It bases on FP-tree technology and adopts a novel Cherry-Items-detecting technology. This novel technology can find those prefixes which result to the unclosed or redundant frequent itemsets without maintaining the frequent closed itemsets mined so far in the main memory. In the performance test, the Cherry algorithm is compared with other state of the art algorithms, such as FP-CLOSE, LCMv2 and DCI-CLOSE, in many synthetic and real data sets. The experimental results demonstrate that the Cherry algorithm outperforms them in low support.
出处
《软件学报》
EI
CSCD
北大核心
2008年第2期379-388,共10页
Journal of Software
基金
Supported by the National Natural Science Foundation of China under Grant No.60673116 (国家自然科学基金)
关键词
关联规则
闭合频繁集
association rule
frequent closed itemset