期刊文献+

六边形区域快速傅里叶变换的CUDA-MPI算法及其实现 被引量:4

A CUDA-MPI ALGORITHM FOR THE FAST FOURIER TRANSFORM ON THE HEXAGON AND ITS IMPLEMENTATION
原文传递
导出
摘要 本文研究六边形区域上快速傅里叶变换(FFTH)的CUDA—MPI算法及其实现.首先,我们通过充分利用CUDA的层次化并行机制及其库函数,设计了FFTH的高效率的CUDA算法.对于规模为3X2048。的双精度复数类型数据,我们设计的CUDA程序与CPU串行程序相比可以达到12倍加速比,如果不计内存和显存之间的数据传输,则加速比可达40倍;其计算效率与CUFFT所提供的二维方形区域FFT程序的效率基本一致.在此基础上,我们通过研究GPU上分布式并行数据的转置与排序算法,优化设计了FFTH的CUDA-MPI算法.在3×8192^2的数据规模、10节点X6GPU的计算环境下,我们的CUDA-MPI程序与CPU串行程序相比达到了55倍的加速;其效率比MPI并行版FFTW以及基于CUFFT本地计算和FFTW并行转置的方形区域并行FFT的效率都要高出很多.FFTH的CUDA-MPI算法研究和测试为大规模CPU+GPU异构计算机系统的可扩展新型算法的探索提供了参考. In this paper, we study the parallel algorithm based on CUDA and MPI for the Fast Fourier Transform on the hexagon (FFTH) and its implementation. Firstly, we design a CU- DA FFTH algorithm by utilizing the hierachica! parallelization mechanism and the build-in CUFFT library for classic rectangular FFTs. With respect to the serial cpu program, our CUDA program achieves 12x speedup for 3 × 2048^2 double-precision complex-to-complex FFTH. If we ignore the PCI between main memory and GPU device memory, around 30x- 40x speedup can be even achieved. Although the non-tensorial FFTH is much more complicated than the rectangular FFT, our CUDA FFTH program gains the same efficiency as the rectangular CUFFT. Next, efforts are mainly contributed to optimization techniques for parallel array transposition and data sorting, which significantly improve the efficiency of the CUDA-MPI FFTH algorithm. On a 10-node cluster with 60 GPUs, our CUDA-MPI program achieves about 55x speedup with respect to the the serial cpu program for 3 × 8192^2 complex-to-complex double-precision FFTH, and it is more efficient than the MPI parallel FFTW. Our research on the CUDA-MPI algorithm for FFTH is beneficial to the exploration and development of new parallel algorithms on large-scale CPU-GPU heterogeneous computer systems.
出处 《数值计算与计算机应用》 CSCD 2012年第1期59-72,共14页 Journal on Numerical Methods and Computer Applications
基金 国家自然科学基金(10971212 91130014)资助项目
关键词 六边形区域快速傅里叶变换 CUDA-MPI算法 并行排序 Fast Fourier Transform on the hexagon (FFTH) CUDA-MPI algorithm parallel sorting
  • 相关文献

参考文献3

二级参考文献12

共引文献36

同被引文献34

  • 1孙家昶,姚继锋.平行六边形区域上的快速离散傅立叶变换[J].计算数学,2004,26(3):351-366. 被引量:10
  • 2Jiachang Sun.MULTIVARIATE FOURIER TRANSFORM METHODS OVER SIMPLEX AND SUPER-SIMPLEX DOMAINS[J].Journal of Computational Mathematics,2006,24(3):305-322. 被引量:5
  • 3Canuto C, Hussaini Y, Quarteroni A, Zang T A. Spectral Methods: Fundamentals in Single Domains [M].Berlin: Springer-Verlag, 2006.
  • 4Bernardi C, Maday Y. Spectral methods [M]// Ciarlet P G, Lions J L. Handbook of Numerical Analysis. Amsterdam: Elsevier, 1997: 209-486.
  • 5Canuto C, Hussaini M Y, Quarteroni A, Zang T A. Spectral Methods in Fluid Dynamics [M]. Berlin: Springer-Verlag, 1988.
  • 6Guo B Y. Spectral Methods and Their Applications [M].Singapore: World Scientific, 1998.
  • 7Gottlieb D, Orszag S A. Numerical Analysis of Spectral Methods [M]. Philadephia: Society for Industrial and Applied Mathematics, 1977.
  • 8Karniadakis G, Sherwin S J. Spectral/hp Element Methods for Computational Fluid Dynamics [M]. 2ed ed. Oxford: Oxford University Press, 2005.
  • 9Canuto C, Hussaini M Y, Quarteroni A, Zang T A. Spectral Methods: Evolution to Complex Geometries and Applications to Fluid Dynamics [M]. Berlin: Springer-Verlag, 2007.
  • 10Shen J, Tang T, Wang L L. Spectral Methods: Algorithms, Analysis and Applications [M]. Berlin: Springer-Verlag, 2011.

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部