摘要
针对不确定数据频繁项集挖掘效率低和准确度不高的问题,提出了一种基于改进的频繁模式树(FPtree)和遗传算法(GA)挖掘不确定数据概率频繁项集的方法,即UFPGA(基于频繁模式树和遗传的挖掘算法).该算法根据不确定数据的构成特征,改进频繁模式树方法挖掘不确定数据频繁项集,采用缩小变异空间和增加育种算子的遗传算法搜索最大频繁项集,收缩了搜索范围,提高了挖掘效率.实验结果表明:该方法在时间复杂度方面有很好的优越性,对大规模的不确定数据挖掘提供了一种有效的技术手段.
Since efficiency and accuracy of mining frequent patterns was not high in uncertain data ,an improved UFPGA (uncertain frequent pattern genetic algorithm) was proposed for mining frequent itemsets of the probability .According to features of uncertain data ,FP‐tree (frequent pattern tree) was improved to mine frequent itemsets and the genetic algorithm with variability of space reduced and breeding operator increased was employed to search for the largest frequent itemsets .UFPGA algo‐rithm shrank the search scope to improve the efficiency of mining frequent itemsets .Results of experi‐ments show that UFPGA algorithm has a good advantage of the time complexity with a positive signif‐icance for large‐scale uncertain data mining .
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2015年第9期29-34,共6页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家科技支撑计划资助项目(2012BAF12B14)
贵州省重大科技专项资助项目([2012]6018
[2013]6019)
贵州省科学技术基金资助项目([2011]2196)
贵州省工业攻关项目([2014]3004)
关键词
数据挖掘
不确定数据
频繁项集
最大频繁项集
频繁模式树
遗传算法
data mining
uncertain data
frequent itemsets
maximal frequent itemsets
frequent pattern tree
genetic algorithm