期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Research on GPU transplantation optimization of PRM scalar advection scheme in GRAPES global forecast system
1
作者 Zhangjie Tan Jinfang Jia +2 位作者 Zhengsheng Ning Jianqiang Huang Xiaoying Wang 《CCF Transactions on High Performance Computing》 2025年第3期226-244,共19页
With the rise of large AI models,Graphics Processing Units(GPUs)have become the preferred hardware solution for many scientific applications due to their superior floating-point computation capabilities.This paper exp... With the rise of large AI models,Graphics Processing Units(GPUs)have become the preferred hardware solution for many scientific applications due to their superior floating-point computation capabilities.This paper explores the application of CPU+GPU heterogeneous accelerators in the Global/Regional Assimilation and Prediction System(GRAPES).We moved the main time-consuming part of the scalar advection scheme(PRM)in the system to run on the GPU.Specifically,we performed a detailed performance analysis of the PRM module and then refactored and ported the code using C and CUDA C to run on the GPU.During this process,we used a series of optimization methods,including changing array storage order,optimizing GPU memory access,and merging loops to increase kernel function computation.Additionally,to reduce communication overhead,we designed a communication-avoidance scheme to improve performance.The final solution showed good accuracy within acceptable error margins and excellent scalability.On a cluster with Intel(R)Xeon(R)Gold 6326 CPUs and NVIDIA A800 GPUs,we achieved up to 87.90 times speedup for the hotspot function and 5.21 times overall speedup for the scalar advection scheme using 16 CPU cores and 8 GPU accelerators. 展开更多
关键词 Scalar Advection Scheme GPU optimization Numerical Simulation heterogeneous optimization GRAPES
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部