期刊文献+

克希霍夫时间偏移在GPU集群上的MPI/CUDA混合编程实现 被引量:2

A Hybrid MPI/CUDA Implementation of Kirchhoff Time Migration on Multi-GPU Clusters
原文传递
导出
摘要 本文介绍了二维/三维克希霍夫时间偏移计算在GPU集群上的MPI/CUDA混合编程实现。系统的主体结构基于经典的主从模式,采用MPI进程-pthread线程-CUDA线程三个层次的并行架构,以及CPU/GPU协同计算并行架构和策略来设计实现。每个计算节点由一个多核CPU和多个GPU设备组成。输入数据在主节点上载入,平均分配到从节点上,存储到相应的磁盘空间内,并根据各节点的可用资源将数据划分为多块。每个从节点分别接收主节点发送的数据并存储到本地磁盘上的临时文件内。根据检测到的可用GPU设备数,每个从节点上创建同样数目的线程来一对一控制各个GPU。各从节点上每个数据块再进一步划分为大小相等的块一一分配给各个线程,由每个线程逐道在CPU上预处理后传送到GPU上处理,道内的各成像点分别由各CUDA线程并行处理。每个数据块偏移由每个线程内的相关道累加计算而得,结果返回到主节点上累加后输出到磁盘文件内。在实现过程中,采用CPU/GPU协同计算以及线}生插值走时的方式来进一步提高性能。系统性能在一典型异构GPU集群上测试,每个节点由一个配置8GB内存的四核CPU,以及配置6GB显存的C1060型号的GPU组成。对于不同的规模和积分计算模式,在该平台上的测试结果表明,本文实现的系统的性能对于包含相同计算节点数、每个节点上4个线程并行计算的MPI版本,可以达到平均约5~10倍的加速。 We present a hybrid system for accelerating 2D/3D Kirchhoff time migration on multi-GPU clusters using MPI and CUDA. The system is implemented using CPU/GPU cooperation based on a typical masterslave mode. It is mapped to three levels: MPI process--pthread--CUDA thread. Each computing node is composed of a many-core CPU augmented with multiple GPUs. The input data is equally distributed by master node among slave nodes, and divided into canvas blocks according to the available resources on the slave nodes. The data is received by slaved nodes and stored into local disk files. Threads with the same number as the detected GPU devices are generated to one-to-one control the GPUs on each slave node. The canvas block is equally sub-divided into trace blocks for those threads to deal with trace by trace on GPU after preprocess on CPU. The image points of the output are processed by CUDA threads in parallel. The compute result of each canvas is contributed by all the related traces of each thread and is sent back to the master node for accumulation before output to disk files. During implementation, we improved the performance by CPU/GPU cooperation and linear exploration of the travel time. The results and performance of our system are analyzed on a typical multi-GPU cluster. Each computing node consists of a Quad-core CPU with 8GB memory, and two C1060 GPUs with 6GB memory. For different scales and integral modes, experimental results show that our hybrid system finally gains about 5-10 speedup in comparison to the multi-thread version of the system on the same platform, with the same number of nodes and 4 threads on each slave node.
出处 《科研信息化技术与应用》 2012年第5期34-41,共8页 E-science Technology & Application
基金 中国科学院科研信息化应用推进工程项目(XXH12503)
关键词 地震勘探资料数据处理 克希霍夫时间偏移 协同计算 MPI CUDA Seismic data processing Kirchhoff Time Migration CPU/GPU cooperation MPI CUDA
  • 相关文献

参考文献5

二级参考文献80

共引文献85

同被引文献12

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部