Recently,researchers have shown increasing interest in combining more than one programming model into systems running on high performance computing systems(HPCs)to achieve exascale by applying parallelism at multiple ...Recently,researchers have shown increasing interest in combining more than one programming model into systems running on high performance computing systems(HPCs)to achieve exascale by applying parallelism at multiple levels.Combining different programming paradigms,such as Message Passing Interface(MPI),Open Multiple Processing(OpenMP),and Open Accelerators(OpenACC),can increase computation speed and improve performance.During the integration of multiple models,the probability of runtime errors increases,making their detection difficult,especially in the absence of testing techniques that can detect these errors.Numerous studies have been conducted to identify these errors,but no technique exists for detecting errors in three-level programming models.Despite the increasing research that integrates the three programming models,MPI,OpenMP,and OpenACC,a testing technology to detect runtime errors,such as deadlocks and race conditions,which can arise from this integration has not been developed.Therefore,this paper begins with a definition and explanation of runtime errors that result fromintegrating the three programming models that compilers cannot detect.For the first time,this paper presents a classification of operational errors that can result from the integration of the three models.This paper also proposes a parallel hybrid testing technique for detecting runtime errors in systems built in the C++programming language that uses the triple programming models MPI,OpenMP,and OpenACC.This hybrid technology combines static technology and dynamic technology,given that some errors can be detected using static techniques,whereas others can be detected using dynamic technology.The hybrid technique can detect more errors because it combines two distinct technologies.The proposed static technology detects a wide range of error types in less time,whereas a portion of the potential errors that may or may not occur depending on the 4502 CMC,2023,vol.74,no.2 operating environment are left to the dynamic technology,which completes the validation.展开更多
本文研究了联合卫星观测数据和重力异常数据确定超高阶重力场模型的理论方法,并使用EGM2008模型重力异常和GOCE(gravity field and ocean circulation explorer)观测数据构建了重力场模型SGG-UGM-1。重点研究了由球面格网重力异常快速...本文研究了联合卫星观测数据和重力异常数据确定超高阶重力场模型的理论方法,并使用EGM2008模型重力异常和GOCE(gravity field and ocean circulation explorer)观测数据构建了重力场模型SGG-UGM-1。重点研究了由球面格网重力异常快速构建超高阶重力场模型的块对角最小二乘方法,将OpenMP技术引入到块对角最小二乘中以提高计算效率,并基于模拟数据验证了方法及算法和软件模块的正确性。采用本文制定的联合解算策略,利用GOCE重力卫星观测数据构建的220阶次法方程和EGM2008模型重力异常构建的2159阶次块对角法方程,联合求解了2159阶次的重力场模型SGG-UGM-1。将SGG-UGM-1与EGM2008、EIGEN-6C2、EIGEN-6C4等超高阶模型在频谱域内进行了比较分析,结果表明SGG-UGM-1相对参考模型的系数误差较小,且在220阶次内的系数精度相比EGM2008模型有了提高。采用中国与美国的GPS/水准数据和毛乌素测区的航空重力观测数据对这些模型进行了外符合精度的检验。检核结果表明,在中国区域,SGG-UGM-1模型大地水准面的精度在EIGEN-6C2和EIGEN-6C4两个模型之间,优于GOSG-EGM模型和EGM2008模型,与美国区域几个模型的精度相当。利用毛乌素测区的航空重力数据对几个模型进行了检核,结果表明SGG-UGM-1模型计算的重力扰动精度与EGM2008、EIGEN-6C4模型相当,优于GOSG-EGM模型和EIGEN-6C2模型。展开更多
The use of multi-core processors will become a trend in safety critical systems. For safe execution of multi- threaded code, automatic code generation from formal spec- ification is a desirable method. Signal, a synch...The use of multi-core processors will become a trend in safety critical systems. For safe execution of multi- threaded code, automatic code generation from formal spec- ification is a desirable method. Signal, a synchronous lan- guage dedicated for the functional description of safety crit- ical systems, provides soundness semantics for determinis- tic concurrency. Although sequential code generation of Sig- nal has been implemented in Polychrony compiler, deter- ministic multi-threaded code generation strategy is still far from mature. Moreover, existing code generation methods use certain multi-thread library, which limits the cross plat- form executions. OpenMP is an application program inter- face (API) standard for parallel programming, supported by several mainstream compilers from different platforms. This paper presents a methodology translating Signal program to OpenMP-based multi-threaded C code. First, the intermedi- ate representation of the core syntax of Signal using syn- chronous guarded actions is defined. Then, according to the compositional semantics of Signal equations, the Signal pro- gram is synthesized to dependency graph (DG). After par- allel tasks are extracted from dependency graph, the Signal program can be finally translated into OpenMP-based C code which can be executed on multiple platforms.展开更多
The leading way to achieve thread-level parallelism on the Sunwayhigh-performance multicore processors is to use OpenMP programming techniques.In order to address the problem of low parallel efficiency caused by hight...The leading way to achieve thread-level parallelism on the Sunwayhigh-performance multicore processors is to use OpenMP programming techniques.In order to address the problem of low parallel efficiency caused by highthread group control overhead in the compilation of Sunway OpenMP programs,this paper proposes the parallel region reconstruction technique. The parallelregion reconstruction technique expands the parallel scope of parallel regionsin OpenMP programs by parallel region merging and parallel region extending.Moreover, it reduces the number of parallel regions in OpenMP programs,decreases the overhead of frequent creation and convergence of thread groups,and converts standard fork-join model OpenMP programs to higher performanceSPMD modelOpenMP programs. On the Sunway 1621 server computer, NPB3.3-OMP and SPEC OMP2012 achieved 8.9% and 7.9% running efficiency improvementrespectively through parallel region reconstruction technique. As a result,the parallel region reconstruction technique is feasible and effective. It providestechnical support to fully exploit the multi-core parallelism advantage of Sunway’shigh-performance processors.展开更多
The primary way to achieve thread-level parallelism on the Sunwayhigh-performance multicore processor is to use the OpenMP programming technique.To address the problem of low parallelism efficiency caused by slow acce...The primary way to achieve thread-level parallelism on the Sunwayhigh-performance multicore processor is to use the OpenMP programming technique.To address the problem of low parallelism efficiency caused by slow accessto thread private variables in the compilation of Sunway OpenMP programs, thispaper proposes a thread private variable access technique based on privilegedinstructions. The privileged instruction-based thread-private variable access techniquecentralizes the implementation of thread-private variables at the compilerlevel, eliminating the model switching overhead of invoking OS core processingand improving the speed of accessing thread-private variables. On the Sunway1621 server platform, NPB3.3-OMP and SPEC OMP2012 achieved 6.2% and6.8% running efficiency gains, respectively. The results show that the techniquesproposed in this paper can provide technical support for giving full play to theadvantages of Sunway’s high-performance multi-core processors.展开更多
基金[King Abdulaziz University][Deanship of Scientific Research]Grant Number[KEP-PHD-20-611-42].
文摘Recently,researchers have shown increasing interest in combining more than one programming model into systems running on high performance computing systems(HPCs)to achieve exascale by applying parallelism at multiple levels.Combining different programming paradigms,such as Message Passing Interface(MPI),Open Multiple Processing(OpenMP),and Open Accelerators(OpenACC),can increase computation speed and improve performance.During the integration of multiple models,the probability of runtime errors increases,making their detection difficult,especially in the absence of testing techniques that can detect these errors.Numerous studies have been conducted to identify these errors,but no technique exists for detecting errors in three-level programming models.Despite the increasing research that integrates the three programming models,MPI,OpenMP,and OpenACC,a testing technology to detect runtime errors,such as deadlocks and race conditions,which can arise from this integration has not been developed.Therefore,this paper begins with a definition and explanation of runtime errors that result fromintegrating the three programming models that compilers cannot detect.For the first time,this paper presents a classification of operational errors that can result from the integration of the three models.This paper also proposes a parallel hybrid testing technique for detecting runtime errors in systems built in the C++programming language that uses the triple programming models MPI,OpenMP,and OpenACC.This hybrid technology combines static technology and dynamic technology,given that some errors can be detected using static techniques,whereas others can be detected using dynamic technology.The hybrid technique can detect more errors because it combines two distinct technologies.The proposed static technology detects a wide range of error types in less time,whereas a portion of the potential errors that may or may not occur depending on the 4502 CMC,2023,vol.74,no.2 operating environment are left to the dynamic technology,which completes the validation.
文摘本文研究了联合卫星观测数据和重力异常数据确定超高阶重力场模型的理论方法,并使用EGM2008模型重力异常和GOCE(gravity field and ocean circulation explorer)观测数据构建了重力场模型SGG-UGM-1。重点研究了由球面格网重力异常快速构建超高阶重力场模型的块对角最小二乘方法,将OpenMP技术引入到块对角最小二乘中以提高计算效率,并基于模拟数据验证了方法及算法和软件模块的正确性。采用本文制定的联合解算策略,利用GOCE重力卫星观测数据构建的220阶次法方程和EGM2008模型重力异常构建的2159阶次块对角法方程,联合求解了2159阶次的重力场模型SGG-UGM-1。将SGG-UGM-1与EGM2008、EIGEN-6C2、EIGEN-6C4等超高阶模型在频谱域内进行了比较分析,结果表明SGG-UGM-1相对参考模型的系数误差较小,且在220阶次内的系数精度相比EGM2008模型有了提高。采用中国与美国的GPS/水准数据和毛乌素测区的航空重力观测数据对这些模型进行了外符合精度的检验。检核结果表明,在中国区域,SGG-UGM-1模型大地水准面的精度在EIGEN-6C2和EIGEN-6C4两个模型之间,优于GOSG-EGM模型和EGM2008模型,与美国区域几个模型的精度相当。利用毛乌素测区的航空重力数据对几个模型进行了检核,结果表明SGG-UGM-1模型计算的重力扰动精度与EGM2008、EIGEN-6C4模型相当,优于GOSG-EGM模型和EIGEN-6C2模型。
文摘The use of multi-core processors will become a trend in safety critical systems. For safe execution of multi- threaded code, automatic code generation from formal spec- ification is a desirable method. Signal, a synchronous lan- guage dedicated for the functional description of safety crit- ical systems, provides soundness semantics for determinis- tic concurrency. Although sequential code generation of Sig- nal has been implemented in Polychrony compiler, deter- ministic multi-threaded code generation strategy is still far from mature. Moreover, existing code generation methods use certain multi-thread library, which limits the cross plat- form executions. OpenMP is an application program inter- face (API) standard for parallel programming, supported by several mainstream compilers from different platforms. This paper presents a methodology translating Signal program to OpenMP-based multi-threaded C code. First, the intermedi- ate representation of the core syntax of Signal using syn- chronous guarded actions is defined. Then, according to the compositional semantics of Signal equations, the Signal pro- gram is synthesized to dependency graph (DG). After par- allel tasks are extracted from dependency graph, the Signal program can be finally translated into OpenMP-based C code which can be executed on multiple platforms.
文摘The leading way to achieve thread-level parallelism on the Sunwayhigh-performance multicore processors is to use OpenMP programming techniques.In order to address the problem of low parallel efficiency caused by highthread group control overhead in the compilation of Sunway OpenMP programs,this paper proposes the parallel region reconstruction technique. The parallelregion reconstruction technique expands the parallel scope of parallel regionsin OpenMP programs by parallel region merging and parallel region extending.Moreover, it reduces the number of parallel regions in OpenMP programs,decreases the overhead of frequent creation and convergence of thread groups,and converts standard fork-join model OpenMP programs to higher performanceSPMD modelOpenMP programs. On the Sunway 1621 server computer, NPB3.3-OMP and SPEC OMP2012 achieved 8.9% and 7.9% running efficiency improvementrespectively through parallel region reconstruction technique. As a result,the parallel region reconstruction technique is feasible and effective. It providestechnical support to fully exploit the multi-core parallelism advantage of Sunway’shigh-performance processors.
文摘The primary way to achieve thread-level parallelism on the Sunwayhigh-performance multicore processor is to use the OpenMP programming technique.To address the problem of low parallelism efficiency caused by slow accessto thread private variables in the compilation of Sunway OpenMP programs, thispaper proposes a thread private variable access technique based on privilegedinstructions. The privileged instruction-based thread-private variable access techniquecentralizes the implementation of thread-private variables at the compilerlevel, eliminating the model switching overhead of invoking OS core processingand improving the speed of accessing thread-private variables. On the Sunway1621 server platform, NPB3.3-OMP and SPEC OMP2012 achieved 6.2% and6.8% running efficiency gains, respectively. The results show that the techniquesproposed in this paper can provide technical support for giving full play to theadvantages of Sunway’s high-performance multi-core processors.