摘要
"神威·太湖之光"是中国自主研发的超级计算机,其处理器芯片为国人自主研发的SW26010异构众核处理器,每个处理器内含有4个核组,每个核组包括1个主核和64个从核。NPB-FT程序的功能是利用快速傅立叶变换求解三维偏微分方程,其被广泛用于评测集群的计算和集合能力,因此选用FT程序对"神威·太湖之光"提供的多层次并行资源和体系架构的性能进行测试具有重要的意义。首先,利用加速线程库将程序改写为主从版本,使计算核心能够在从核上执行;其次,利用从核的寄存器通信以及主从核间的数据传输通道,消除FT程序中的数据转置过程;然后,实现了计算与通信隐藏,避免了核间通信时核内的计算资源处于空闲状态;最后,利用向量化和指令流水技术,提升程序的数据级并行和指令级并行。实验结果为:单核上3D-32规模的加速比为66,64核上3D-512规模的加速比为20,256核上3D-2048规模的加速比为46。
Sunway TaihuLight is a supercomputer independently developed by China.Its processor is SW26010 heterogeneous many-core processor,which is also independently developed by Chinese.Each processor includes four core-groups,and each core-group includes one management processing element(MPE) and 64 computing processing elements(CPEs).The function of NPB -FT program is to solve three -dimensional partial differential equations by using Fast Fourier Transform,and it is widely used in the evaluation of cluster computing and aggregation capabilities.Therefore,it is of great importance to use the FT program to analyze the multi-level parallel resources provided by Sunway TaihuLight and the performance of the architecture.First of all,the program is rewritten as master -slave version by accelerating athread library,so that the program core can be executed by the CPEs.Second,the data transposition process in the FT program is eliminated by using register communication of CPEs and the data transmission channel between the MPE and CPEs.Further,the computing and communication hiding are realized to avoid the computing resources in the core being in idle state while communicating between cores.Finally,the vectorization and instruction flow technology are used to enhance the program’s data-level and instruction-level parallelism.The experimental results show that the 3D -32 program executing on a single core has an acceleration ratio of 66.The acceleration ratio of 3D -512 program executing on 64 cores is 20 while the acceleration ratio of 3D -2048 program executing on 256 cores is 46.
作者
陶小涵
庞建民
高伟
王琦
姚金阳
TAO Xiao-han;PANG Jian-min;GAO Wei;WANG Qi;YAO Jin-yang(Information Engineering University,Zhengzhou 450001,China;State Key Laboratory of Mathematical Engineering and Advanced Computing,Wuxi,Jiangsu 214125,China)
出处
《计算机科学》
CSCD
北大核心
2019年第4期321-328,共8页
Computer Science
基金
国家重点研发计划"高性能计算"重点专项(2016YFB0200503)资助