摘要
针对Eclat算法连接和剪枝操作耗时的缺点,按照项集之间的可连接性,将数据集划分为等价类并分段存储,采用末项剪枝策略,在常量时间内完成连接和剪枝操作.针对Eclat算法求长集合的交集操作需要大量计算的缺点,采用多维数组分段存储项集的事务集,将长集合的求交集操作转换为分段求短集合的交集,并提出期望支持度的概念,在求交集的过程中预测支持度,从而减少求交集的比较次数.实验结果表明,该算法在时间性能方面优于Eclat算法,尤其适用于挖掘长模式稀疏数据集.
In view of shortage of time-consuming of connection and pruning step for Eclat algorithm,a method is proposed to divide the data set into equivalence classes with segmented storage according to the connectivity between itemsets.Using the end item pruning strategy,the connection and pruning step will be completed in constant time.In view of shortage of computation of the intersection operation of long sets for Eclat algorithm,a method is proposed to store the transaction sets of itemsets segment by multidimensional array,convert the computation of intersection operation of long sets into short sets in segment,and the concept of the expected support is proposed.It can be forecasted in the process of calculating intersection,so the times of comparing will be reduced.The experimental results show that the algorithm is superior to Eclat algorithm in time performance,and it is suitable for mining long patterns sparse data sets especially.
出处
《吉林大学学报(理学版)》
CAS
CSCD
北大核心
2016年第3期553-560,共8页
Journal of Jilin University:Science Edition
基金
国家自然科学基金(批准号:61133011)
吉林省教育厅"十二五"科学技术研究项目(批准号:2013431)
关键词
频繁项集
垂直格式
分段存储
期望支持度
frequent itemset
vertical format
segmented storage
expected support