摘要
分析了目前众多的Apriori算法的改进算法,指出其不完备性,提出了一种事务的二元组表示法,该二元组直接用字段的值串和串的出现次数来替换原始事务数据库,并在此基础上进行扫描,该表示法所占内存大小只取决于数据库的基,而与数据库的大小无关,整个过程只进行一次数据库扫描,其它工作都在内存中完成,在数据库的基较小时,表现出良好的性能。同时,定义了一种基于该二元组表示法的索引结构来表示频繁项集,该方案占用内存少,速度快。
Much of improved Apriori algorithms are analyzed, and their incompleteness are pointed out. The two-tuple representation of transactions is presented, which uses directly the value of strings and the frequencies of strings to replace the original transaction database, and carries out scanning on this basis. The memory size occupied by the representation only depends on the basis of database, and it is irrelevant to the size of database. The whole process completes only one scanning of databases, and the other works are completed in memory, and the representation has good performance when the basis of database is less. Meanwhile, an index structure is defined based on the representation to express frequent item set. The scheme occupies less memory and has fast soeed.
出处
《计算机工程与设计》
CSCD
北大核心
2009年第16期3811-3813,共3页
Computer Engineering and Design
基金
内蒙古人才基金项目(第8批)
内蒙古教育科研基金项目(NJZY07140)