The parallel acceleration of well-developed serial codes for numerical simulations of fluid dynamic problems is implemented in this paper. To solve the flow field,the lattice Boltzmann method( LBM) is used. The OpenAC...The parallel acceleration of well-developed serial codes for numerical simulations of fluid dynamic problems is implemented in this paper. To solve the flow field,the lattice Boltzmann method( LBM) is used. The OpenACC( OpenACCelerator) application programming interface,which is a relatively new standard for parallel computation,is adopted to perform the acceleration. By parallelizing the computation intensive loops and decreasing some meaningless data movement,the serial codes based on LBM are obviously accelerated. Several benchmark problems are considered and then simulated through the OpenACC programming model to assess performance and computational efficiency. After optimization,a reasonable speedup is obtained in comparison with the original version.展开更多
Now the OpenACC has become a popular programming interface for many-core application programming.Internationally,a lot of research have been done on OpenACC for CPU+GPU heterogeneous many-core architecture.Among them,...Now the OpenACC has become a popular programming interface for many-core application programming.Internationally,a lot of research have been done on OpenACC for CPU+GPU heterogeneous many-core architecture.Among them,the PGI OpenACC compiler developed by NVIDIA is the most advanced one.But there are few research on OpenACC related to the Home Grown Heterogeneous Many-Core(HGHM)Architecture that is different from GPU.This paper proposes an automatic mapping technique for OpenACC kernel code based on the OpenACC compiler to a heterogeneous and deeply fused many-core architecture.Our approach uses the static analysis and feedback dynamic analysis of the compiler to perform the automatic mapping of the program parallel kernel code to many-core devices,and it greatly improves the transformation quality of the compiler.Experimental results show that this technique can greatly improve the efficiency of using OpenACC to port applications to heterogeneous and fused many-core system without impacting program acceleration performance.展开更多
A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method(PRM) scalar advection scheme in the Global/Reg...A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method(PRM) scalar advection scheme in the Global/Regional Assimilation and Prediction System(GRAPES) solves the moisture flux advection equation based on PRM.Computation of the scalar advection involves boundary exchange,and computation of higher bandwidth requirements is complicated and time-consuming in GRAPES.Recently,Graphics Processing Units(GPUs) have been widely used to solve scientific and engineering computing problems owing to advancements in GPU hardware and related programming models such as CUDA/OpenCL and Open Accelerator(OpenACC).Herein,we present an accelerated PRM scalar advection scheme with Message Passing Interface(MPI) and OpenACC to fully exploit GPUs’ power over a cluster with multiple Central Processing Units(CPUs) and GPUs,together with optimization of various parameters such as minimizing data transfer,memory coalescing,exposing more parallelism,and overlapping computation with data transfers.Results show that about 3.5 times speedup is obtained for the entire model running at medium resolution with double precision when comparing the scheme’s elapsed time on a node with two GPUs(NVIDIA P100) and two 16-core CPUs(Intel Gold 6142).Further,results obtained from experiments of a higher resolution model with multiple GPUs show excellent scalability.展开更多
基金Sponsored by the Research Fund of State Key Laboratory of Mechanics and Control of Mechanical Structures(Nanjing University of Aeronautics and Astronautics)(Grant No.MCMS-0117G01)
文摘The parallel acceleration of well-developed serial codes for numerical simulations of fluid dynamic problems is implemented in this paper. To solve the flow field,the lattice Boltzmann method( LBM) is used. The OpenACC( OpenACCelerator) application programming interface,which is a relatively new standard for parallel computation,is adopted to perform the acceleration. By parallelizing the computation intensive loops and decreasing some meaningless data movement,the serial codes based on LBM are obviously accelerated. Several benchmark problems are considered and then simulated through the OpenACC programming model to assess performance and computational efficiency. After optimization,a reasonable speedup is obtained in comparison with the original version.
基金supported by the National Key RD Program of China(Grant no.2017YFB02-02004)the Project of manned space engineering technology(2018-14)+1 种基金“Large-scale parallel computation of aerodynamic problems of irregular spacecraft reentry covering various flow regimes”the National Natural Science Foundation of China(91530319).
文摘Now the OpenACC has become a popular programming interface for many-core application programming.Internationally,a lot of research have been done on OpenACC for CPU+GPU heterogeneous many-core architecture.Among them,the PGI OpenACC compiler developed by NVIDIA is the most advanced one.But there are few research on OpenACC related to the Home Grown Heterogeneous Many-Core(HGHM)Architecture that is different from GPU.This paper proposes an automatic mapping technique for OpenACC kernel code based on the OpenACC compiler to a heterogeneous and deeply fused many-core architecture.Our approach uses the static analysis and feedback dynamic analysis of the compiler to perform the automatic mapping of the program parallel kernel code to many-core devices,and it greatly improves the transformation quality of the compiler.Experimental results show that this technique can greatly improve the efficiency of using OpenACC to port applications to heterogeneous and fused many-core system without impacting program acceleration performance.
基金supported by the decision support project of response to climate change of China,the National Natural Science Foundation of China (Nos.41674085, 41604009, and 41621091)the Natural Science Foundation of Qinghai Province (No. 2019-ZJ-7034)the Open Project of State Key Laboratory of Plateau Ecology and Agriculture,Qinghai University (No. 2020-zz-03)。
文摘A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method(PRM) scalar advection scheme in the Global/Regional Assimilation and Prediction System(GRAPES) solves the moisture flux advection equation based on PRM.Computation of the scalar advection involves boundary exchange,and computation of higher bandwidth requirements is complicated and time-consuming in GRAPES.Recently,Graphics Processing Units(GPUs) have been widely used to solve scientific and engineering computing problems owing to advancements in GPU hardware and related programming models such as CUDA/OpenCL and Open Accelerator(OpenACC).Herein,we present an accelerated PRM scalar advection scheme with Message Passing Interface(MPI) and OpenACC to fully exploit GPUs’ power over a cluster with multiple Central Processing Units(CPUs) and GPUs,together with optimization of various parameters such as minimizing data transfer,memory coalescing,exposing more parallelism,and overlapping computation with data transfers.Results show that about 3.5 times speedup is obtained for the entire model running at medium resolution with double precision when comparing the scheme’s elapsed time on a node with two GPUs(NVIDIA P100) and two 16-core CPUs(Intel Gold 6142).Further,results obtained from experiments of a higher resolution model with multiple GPUs show excellent scalability.