摘要
提出根据信息熵划分属性值区间或集合、自动生成与人机交互相结合确定层次结构的方法,将多维多层多数据类型问题转化为受约束的一维单层布尔型问题.在此基础上,对直接生成频繁模式的FPT Gen算法进行了扩展,实现了有效挖掘多维多层关联规则的新算法MDML FPT Gen,其效率与可伸缩性均优于经典方法.
Association rule discovery plays an important role in data mining. Most of the proposed algorithms are based on Apriori that scans databases as many times as the maximal length of patterns, which results in low efficiency in mining multidimension multilevel rules where the length of patterns over 20 is not uncommon. Moreover, current approaches deal with quantitative attributes by merging adjacent ranges to create simple concept hierarchies, which is too simple to be useful in real applications. To address these problems, a method based on the information entropy to partition quantitative intervals or qualitative values is presented in this paper. The automatic and interactive combined approach for the concept hierarchy formation is proposed. On the basis of that, multidimension multilevel multidatatype association rules can be mined by constrained singledimension singlelevel boolean algorithms. Discussions about FPTGen, an algorithm we proposed recently for mining frequent patterns, are detailed. The design of a new algorithm MDMLFPTGen, derived from FPTGen, is presented. Experimental evaluations show MDMLFPTGen is more efficient and scalable than Aprioribased classical algorithms.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2003年第2期205-210,共6页
Journal of Nanjing University(Natural Science)
基金
浙江省自然科学基金(602140)
浙江省教育厅科研计划(20020635)