In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr...In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.展开更多
Z-curve’s encoding and decoding algorithms are primely important in many Z-curve-based applications.The bit interleaving algorithm is the current state-of-the-art algorithm for encoding and decoding Z-curve.Although ...Z-curve’s encoding and decoding algorithms are primely important in many Z-curve-based applications.The bit interleaving algorithm is the current state-of-the-art algorithm for encoding and decoding Z-curve.Although simple,its efficiency is hindered by the step-by-step coordinate shifting and bitwise operations.To tackle this problem,we first propose the efficient encoding algorithm LTFe and the corresponding decoding algorithm LTFd,which adopt two optimization methods to boost the algorithm’s efficiency:1)we design efficient lookup tables(LT)that convert encoding and decoding operations into table-lookup operations;2)we design a bit detection mechanism that skips partial order of a coordinate or a Z-value with consecutive 0s in the front,avoiding unnecessary iterative computations.We propose order-parallel and point-parallel OpenMP-based algorithms to exploit the modern multi-core hardware.Experimental results on discrete,skewed,and real datasets indicate that our point-parallel algorithms can be up to 12.6×faster than the existing algorithms.展开更多
基金supported by the fund from ShenyangMint Company Limited(No.20220056)Senior Talent Foundation of Jiangsu University(No.19JDG022)Taizhou City Double Innovation and Entrepreneurship Talent Program(No.Taizhou Human Resources Office[2022]No.22).
文摘In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.
基金funded by the Key Project of the Open Fund for Computer Technology Applications in Yunnan under Grant no.CB23031D025A.
文摘Z-curve’s encoding and decoding algorithms are primely important in many Z-curve-based applications.The bit interleaving algorithm is the current state-of-the-art algorithm for encoding and decoding Z-curve.Although simple,its efficiency is hindered by the step-by-step coordinate shifting and bitwise operations.To tackle this problem,we first propose the efficient encoding algorithm LTFe and the corresponding decoding algorithm LTFd,which adopt two optimization methods to boost the algorithm’s efficiency:1)we design efficient lookup tables(LT)that convert encoding and decoding operations into table-lookup operations;2)we design a bit detection mechanism that skips partial order of a coordinate or a Z-value with consecutive 0s in the front,avoiding unnecessary iterative computations.We propose order-parallel and point-parallel OpenMP-based algorithms to exploit the modern multi-core hardware.Experimental results on discrete,skewed,and real datasets indicate that our point-parallel algorithms can be up to 12.6×faster than the existing algorithms.