期刊文献+

基于GPU的稀疏线性系统的预条件共轭梯度法 被引量:11

GPU-based preconditioned conjugate gradient method for solving sparse linear systems
在线阅读 下载PDF
导出
摘要 研究了基于GPU的稀疏线性方程组的预条件共轭梯度法加速求解问题,并基于统一计算设备架构(CUDA)平台编制了程序,在NVIDIAGT430 GPU平台上进行了程序性能测试和分析。稀疏矩阵采用压缩稀疏行(CSR)格式压缩存储,针对预条件共轭梯度法的算法特性,研究了基于GPU的稀疏矩阵与向量相乘的性能优化、数据从CPU端传到GPU端的加速传输措施。将编制的稀疏矩阵与向量相乘的kernel函数和CUSPARSE函数库中的cusparseDcsrmv函数性能进行了对比,最优得到了2.1倍的加速效果。对于整个预条件共轭梯度法,通过自编kernel函数来实现的算法较之采用CUBLAS库和CUSPARSE库实现的算法稍具优势,与CPU端的预条件共轭梯度法相比,最优可以得到7.4倍的加速效果。 A GPU-accelerated preconditoned conjugate gradient method was studied to solve sparse linear equations. And the sparse matrix was stored in the Compressed Sparse Row (CSR) format. The programmes were coded on Compute Unified Device Architecture (CUDA) and tested on the device of nVidia GT430 GPU. According to the features of conjugate gradient method, strategies were investigated to optimize the sparse matrix vector multiplication and the data transfer between CPU and GPU. Compared with the implementation calling cusparseDcsrmv, the self-developed kernel code of sparse matrix vector multiplication can go to a speed-up of 2.1 in the best case. Equipped with this kernel, the preconditioned conjugate gradient code obtains a maximum speed-up of 7.4 against the CPU code, which is a bit advantageous over that using CUBLAS library and CUSPARSE library.
出处 《计算机应用》 CSCD 北大核心 2013年第3期825-829,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(51109072)
关键词 图形处理器 稀疏线性方程组 预条件共轭梯度法 压缩稀疏行 统一计算设备架构 Graphic Processing Unit (GPU) sparse linear equations preconditioned conjugate gradient method Compressed Sparse Row (CSR) Compute Unified Device Architecture (CUDA)
  • 相关文献

参考文献14

  • 1曾攀.工程中的有限元方法[M]北京:清华大学出版社,2006.
  • 2Nvidia. NVIDIA CUDA C programming guide[EB/OL].http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf,2012.
  • 3KRUGER T,WESTERMANN R. Linear algebra operators for GPU implementation of numerical algorithms[J].ACM Transactions on Graphics,2003,(03):908-916.doi:10.1145/882262.882363.
  • 4BOLZ J,FARMER I,GRISPUN E. Sparse matrix solvers on the GPU:conjugate gradients and multigrid[J].ACM Transactions on Graphics,2003,(03):917-924.doi:10.1145/882262.882364.
  • 5NATHAN B,MICHAEL G. Efficient sparse matrix-vector multiplication on CUDA[R].Santa Clara,California:NVIDIA,2008.
  • 6AIL C,AKIRA N,SATOSHI M. Fast conjugate gradients with multiple GPUs[A].Berlin:Springer-Verlag,2009.893-903.
  • 7MUTHU M B,RAJESH B. Optimizing sparse matrix-vector multiplication on GPUs[R].Armonk,NY:IBM,2009.
  • 8李晓梅,吴建平.Krylov子空间方法及其并行计算[J].计算机科学,2005,32(1):19-20. 被引量:20
  • 9李爱芹.线性方程组的迭代解法[J].科学技术与工程,2007,7(14):3357-3364. 被引量:16
  • 10YOUSEF S. Iterative methods for sparse linear systems[M].Philadelphia:Society for Industrial and Applied Mathematics,2003.

二级参考文献40

  • 1袁明武,孙树立,蔡定正.一种新的墙单元[J].计算结构力学及其应用,1996,13(1):17-24. 被引量:7
  • 2[2]徐树方.矩阵计算的理论方法.北京:北京大学出版社,2001:150-172
  • 3[5]Bai Z Z Golub G H,Ng M K.Hermitian and Skew-Hermitian splitting methods for non-Hermitian positive definite linear systems.J Comp Appl Math,2002;138:(2),287字269
  • 4[6]Yousef Saad,Henk A.van der Vorst.Iterative solution of linear systems in the 20th century.Journal of Computational an Applied Mathematics,2000,123,(1):1-33
  • 5nVidia Corporation. nVidia CUDA Programming Guide 3.0[R]. 2010.
  • 6Bolz J, Farmer I, Grispun E, Schroder P. Sparse ma- trix solvers on the GPU..conjugate gradients and mul- tigrid[J]. ACM Transactions on Graphics, 2003,22 : 917-924.
  • 7Goddeke D. Gpgpu Performance Tuning [R]. Teeh. Rep., University of Dortmund, Germany, http:ff www. mathematik, uni-dortmund, de/ goeddeke/ gpgpu/, 2005.
  • 8Goddeke D, Strzodka R,Turek S, Accelerating double precision FEM simulations with GPUs [A]. Proe. ASIM[C]. 2005.
  • 9Goddeke D, Strzodka R, Turek S. Performance and ac- curacy of hardware-oriented native-, emulated- and mixed precision solvers in FEM simulations[J]. In- ternational Journal of Parallel, Emergent and Dis- tributed Systems, 2007,22 : 221-256.
  • 10Goddeke D, Strzodka R, Mohd-Yusof J, McCormick P,Wobker H,Becker C,Turek S. Using GPUs to Im- prove multigrid solver performance on a Cluster[J]. International Journal of Computational Science and Engineering, 2008,4 : 36-55.

共引文献45

同被引文献108

引证文献11

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部