摘要
随着信息技术迅速发展,数据库的规模不断扩大,从而产生了大量的数据;如果使用传统的数据挖掘技术从庞大的数据中挖掘出有价值的规则、模式等将需要巨大的计算资源,而且会花费很长的时间;在考虑到挖掘的效率,负载平衡,运行环境,节点状态等多方面因素的基础上,文中提出了新的并行数据挖掘算法;各个并行计算单元之间采用全局通讯模式—Master-Worker模式来进行互相通信,降低了并行数据挖掘的通信成本,提高了挖掘的效率,缩短了挖掘的时间;最后,通过worker节点和Master节点的实验,采用一多属性的大数据量的数据库,将实验结果与串行算法进行了比较,实验结果验证了该算法的有效性以及在大数据集挖掘应用中的优越性。
With the rapid development of information technology, expanding the size of the database, results in a large amount of data. If we use traditional data mining techniques from a large data to get valuable rules, models and so on, it will require enormous computing re-sources, and will take a long time. Taking into account the efficiency of the mining, load balancing, the running environment, node status and other factors, the paper proposes a new parallel data mining algorithm, the various parallel computing units use Master-Worker model for communication with each other. The proposed algorithm reduces the communication costs of parallel data mining, improves the efficiency of data mining, and shortens the period time of data mining. Finally, throughout the experiment of the Worker nodes and the Master nodes, it adopts a large amount of database with many attributes. The experimental results are compared with the serial algorithm. The experiment results demonstrate its effectiveness and superiority of the algorithm in large size data mining application.
出处
《计算机测量与控制》
北大核心
2013年第4期1008-1010,1026,共4页
Computer Measurement &Control
基金
河南省科技厅2011发展计划(112102310527)
关键词
数据挖掘
并行算法
动态调度
全局通讯模式
data mining
parallel algorithm
dynamic scheduling
master-worker model