摘要
已有的Top-k高效用项集挖掘为了保持向下封闭性,利用项集的事务效用代替其真实效用,使得项集效用被估计得过大,导致剪枝效果不好,挖掘效率较低.针对这一问题,提出了索引效用的概念,在此基础上建立两级索引,并进行索引剪枝,增强了挖掘中剪枝的效果,提高了Top-k高效用项集挖掘的效率;此外,通过建立效用矩阵,支持对项集效用的快速计算,进一步提高了挖掘效率.不同类型数据集上的实验验证了所提出的Top-k高效用项集挖掘方法的有效性和高效性.
The existing methods of Top-k high utility itemset mining substitute the transaction utilities of itemsets for their real utilities in order to keep the downward closure property. This makes the utilities of itemsets be estimated too large, resulting in bad pruning effect and low mining efficiency. To solve this problem, the concept of the index utility was proposed. On this basis, the two-level index was built and pruned, by which the pruning effect was strengthened and the efficiency of Top-k high utility itemset mining was enhanced. Moreover, the fast calculation of itemset utilities was supported by building the utility matrix. Therefore, the mining efficiency was further enhanced. The experiments on different types of datasets validate the effectiveness and the efficiency of the proposed method.
出处
《东北大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2016年第1期24-28,共5页
Journal of Northeastern University(Natural Science)
基金
国家自然科学基金资助项目(61272177)
关键词
项集效用
索引效用
Top—k高效用项集
尾超项集
效用矩阵
itemset utility
the index utility
Top-k high utility itemset
ending super itemset
utility matrix