Smoothed Particle Hydrodynamics (SPH) is fast emerging as a practically usefulcomputational simulation tool for a wide variety of engineering problems. SPH isalso gaining popularity as the back bone for fast and reali...Smoothed Particle Hydrodynamics (SPH) is fast emerging as a practically usefulcomputational simulation tool for a wide variety of engineering problems. SPH isalso gaining popularity as the back bone for fast and realistic animations in graphicsand video games. The Lagrangian and mesh-free nature of the method facilitates fastand accurate simulation of material deformation, interface capture, etc. Typically,particle-based methods would necessitate particle search and locate algorithms tobe implemented efficiently, as continuous creation of neighbor particle lists is acomputationally expensive step. Hence, it is advantageous to implement SPH, on modernmulti-core platforms with the help of High-Performance Computing (HPC) tools. Inthis work, the computational performance of an SPH algorithm is assessed on multicore Central Processing Unit (CPU) as well as massively parallel General PurposeGraphical Processing Units (GP-GPU). Parallelizing SPH faces several challenges suchas, scalability of the neighbor search process, force calculations, minimizing threaddivergence, achieving coalesced memory access patterns, balancing workload, ensuringoptimum use of computational resources, etc. While addressing some of these challenges,detailed analysis of performance metrics such as speedup, global load efficiency, globalstore efficiency, warp execution efficiency, occupancy, etc. is evaluated. The OpenMP andCompute Unified Device Architecture (CUDA) parallel programming models have beenused for parallel computing on Intel Xeon(R) E5-2630 multi-core CPU and NVIDIAQuadro M4000 and NVIDIA Tesla p100 massively parallel GPU architectures. Standardbenchmark problems from the Computational Fluid Dynamics (CFD) literature are chosen for the validation. The key concern of how to identify a suitable architecturefor mesh-less methods which essentially require heavy workload of neighbor search andevaluation of local force fields from neighbor interactions is addressed.展开更多
文摘Smoothed Particle Hydrodynamics (SPH) is fast emerging as a practically usefulcomputational simulation tool for a wide variety of engineering problems. SPH isalso gaining popularity as the back bone for fast and realistic animations in graphicsand video games. The Lagrangian and mesh-free nature of the method facilitates fastand accurate simulation of material deformation, interface capture, etc. Typically,particle-based methods would necessitate particle search and locate algorithms tobe implemented efficiently, as continuous creation of neighbor particle lists is acomputationally expensive step. Hence, it is advantageous to implement SPH, on modernmulti-core platforms with the help of High-Performance Computing (HPC) tools. Inthis work, the computational performance of an SPH algorithm is assessed on multicore Central Processing Unit (CPU) as well as massively parallel General PurposeGraphical Processing Units (GP-GPU). Parallelizing SPH faces several challenges suchas, scalability of the neighbor search process, force calculations, minimizing threaddivergence, achieving coalesced memory access patterns, balancing workload, ensuringoptimum use of computational resources, etc. While addressing some of these challenges,detailed analysis of performance metrics such as speedup, global load efficiency, globalstore efficiency, warp execution efficiency, occupancy, etc. is evaluated. The OpenMP andCompute Unified Device Architecture (CUDA) parallel programming models have beenused for parallel computing on Intel Xeon(R) E5-2630 multi-core CPU and NVIDIAQuadro M4000 and NVIDIA Tesla p100 massively parallel GPU architectures. Standardbenchmark problems from the Computational Fluid Dynamics (CFD) literature are chosen for the validation. The key concern of how to identify a suitable architecturefor mesh-less methods which essentially require heavy workload of neighbor search andevaluation of local force fields from neighbor interactions is addressed.