期刊文献+

基于SW26010处理器的FT程序的性能优化 被引量:6

Performance Optimization of FT Program Based on SW26010 Processor
在线阅读 下载PDF
导出
摘要 "神威·太湖之光"是中国自主研发的超级计算机,其处理器芯片为国人自主研发的SW26010异构众核处理器,每个处理器内含有4个核组,每个核组包括1个主核和64个从核。NPB-FT程序的功能是利用快速傅立叶变换求解三维偏微分方程,其被广泛用于评测集群的计算和集合能力,因此选用FT程序对"神威·太湖之光"提供的多层次并行资源和体系架构的性能进行测试具有重要的意义。首先,利用加速线程库将程序改写为主从版本,使计算核心能够在从核上执行;其次,利用从核的寄存器通信以及主从核间的数据传输通道,消除FT程序中的数据转置过程;然后,实现了计算与通信隐藏,避免了核间通信时核内的计算资源处于空闲状态;最后,利用向量化和指令流水技术,提升程序的数据级并行和指令级并行。实验结果为:单核上3D-32规模的加速比为66,64核上3D-512规模的加速比为20,256核上3D-2048规模的加速比为46。 Sunway TaihuLight is a supercomputer independently developed by China.Its processor is SW26010 heterogeneous many-core processor,which is also independently developed by Chinese.Each processor includes four core-groups,and each core-group includes one management processing element(MPE) and 64 computing processing elements(CPEs).The function of NPB -FT program is to solve three -dimensional partial differential equations by using Fast Fourier Transform,and it is widely used in the evaluation of cluster computing and aggregation capabilities.Therefore,it is of great importance to use the FT program to analyze the multi-level parallel resources provided by Sunway TaihuLight and the performance of the architecture.First of all,the program is rewritten as master -slave version by accelerating athread library,so that the program core can be executed by the CPEs.Second,the data transposition process in the FT program is eliminated by using register communication of CPEs and the data transmission channel between the MPE and CPEs.Further,the computing and communication hiding are realized to avoid the computing resources in the core being in idle state while communicating between cores.Finally,the vectorization and instruction flow technology are used to enhance the program’s data-level and instruction-level parallelism.The experimental results show that the 3D -32 program executing on a single core has an acceleration ratio of 66.The acceleration ratio of 3D -512 program executing on 64 cores is 20 while the acceleration ratio of 3D -2048 program executing on 256 cores is 46.
作者 陶小涵 庞建民 高伟 王琦 姚金阳 TAO Xiao-han;PANG Jian-min;GAO Wei;WANG Qi;YAO Jin-yang(Information Engineering University,Zhengzhou 450001,China;State Key Laboratory of Mathematical Engineering and Advanced Computing,Wuxi,Jiangsu 214125,China)
出处 《计算机科学》 CSCD 北大核心 2019年第4期321-328,共8页 Computer Science
基金 国家重点研发计划"高性能计算"重点专项(2016YFB0200503)资助
关键词 傅立叶变换 SW26010处理器 寄存器通信 通信隐藏 Fourier transform SW26010 processor Register communication Communication hiding
  • 相关文献

参考文献9

二级参考文献56

  • 1魏敏,王彬,孙婧,谷军霞,洪文董.“天河一号”系列超级计算机系统气象领域适用性分析[J].气象科技进展,2012,2(1):31-35. 被引量:5
  • 2蔡军,许丽人,申晓莹.大气环境仿真的工程化应用研究[J].系统仿真学报,2015,27(1):192-196. 被引量:5
  • 3Edelman A, Mccorquodale P, Toledo S. The future fast Fourier transform [J]. SIAM Journal on Scientific Computing, 1999, 20(3): 1094-1114.
  • 4Loan C Van. Computational Frameworks for the Fast Fourier Transform [M]. Philadelphia: Society for Industrial and Applied Mathematics, 1992.
  • 5Fang Bin, Deng Yuefan, Martyna Glenn. Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer [J]. Computer Physics Communication, 2007, 176(8):531-538.
  • 6Frigo M, Johnson S G. The fast Fourier tranform in the west [OL]. [2009-06-01]. http://www. fftw. org.
  • 7Frigo M, Johnson S G. FFTW: An adaptive software architecture for the FFT [C] //Proc of the IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP), Vol 3. Piscataway, NJ: IEEE, 1998:1381-1384.
  • 8Haynes Peter D, Cote Michel. Parallel fast Fourier transforms for electronic structure calculations [J]. Computer Physics Communications, 2000, 130(1): 130-136.
  • 9Lei Hu, Ian Gorton. Performance evaluation for parallel systems:A survey. University of NSW, Sydney, Australia, Tech Rep:UNSW-CSE- TR-9707, 1997
  • 10Marcelo Lobosco, Vitor Santos Costa, Claudio L. de Amorim.Performance evaluation of fast ethernet, giganet and myrinet on a Cluster. In: Proc. Int'l Conf. Computer Science. Berlin:Springer-Verlag, 2002

共引文献72

同被引文献27

引证文献6

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部