摘要
依据各级缓存容量,将CPU主存中种群个体和蚂蚁个体数据划分存储到一级、二级和三级缓存中,以减少并行计算过程中数据在各级存储之间的传输开销,在CPU与GPU之间采取异步传送和不完全传送数据、GPU多个内核函数异步执行多个流的方法,设置GPU block线程数量为16的倍数、GPU共享存储器划分大小为32倍的bank,使用GPU常量存储器存储交叉概率、变异概率等需频繁访问的只读参数,将输入串矩阵和重叠部分长度矩阵只读大数据结构绑定到GPU纹理存储器,设计实现了一种多核CPU和GPU协同求解最短公共超串问题的计算、存储和通信高效的并行算法。求解多种规模的最短公共超串问题的实验结果表明,多核CPU与GPU协同并行算法比串行算法快70倍以上。
According to the capacity of multi-level caches, the population individuality and ant data in CPU main memory were assigned to L3 cache, L2 cache and L1 cache to reduce data transfer overhead among multiple caches during parallel computing. The asynchronous and incomplete transmission was performed between CPU and GPU, and multiple flows were asynchronously executed by multiple GPU kernel functions. The thread number of GPU block was set to the size of 16 times and GPU public memory was divided into bank with the size of 32 times. GPU constant memory was used to store read-only parameters such as cross probability and mutate probability which were read frequently. The read-only big data structure such as string set and overlap matrix were bound to GPU texture memory, and a computation, cache and communication-efficient parallel algorithm for CPU and GPU to coordinate solving shortest common superstring problem was designed and implemented. The experimental results for solving shortest common superstring problem with several sizes show the proposed CPU and GPU parallel algorithm is faster over 70 times than the sequential algorithm.
出处
《计算机应用》
CSCD
北大核心
2014年第7期1857-1861,1866,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(60963001)
广西研究生教育创新计划项目(YCSZ2013006)
广西教育厅-广西大学博士点建设基金资助项目(P11900119)
关键词
最短公共超串
并行算法
GPU计算
遗传算法
蚁群算法
shortest common superstring
parallel algorithm
GPU computing
genetic algorithm
ant colony optimization