期刊文献+

基于申威1600的3级BLAS GEMM函数优化 被引量:11

Optimization of BLAS Level 3 Functions on SW1600
在线阅读 下载PDF
导出
摘要 BLAS是当前科学计算领域重要的底层支持数学库之一,其中的3级BLAS函数应用最为广泛.本文基于国产申威1600平台,提出了一种基础线性代数库BLAS的三级函数通用矩阵乘GEMM的高性能实现方法.在单核上,使用乘加指令、循环展开、软件流水线指令重排、SIMD向量化运算、寄存器分块技术等与平台架构相关的技术手段,实现汇编级手工优化;在多核上,提出了适用于该平台的多线程加速方案.实验结果显示,在单核串行性能测试中,与知名开源数学库Goto BLAS相比,我们实现了平均4.72倍的加速效果;在多核并行扩展测试中,4线程版的性能则平均达到了单线程版性能的3.02倍. BLAS is one of the most important basic underlying math library for scientific computing, in which the level 3 BLAS functions are most widely used. In this paper, we provide a high-performance method to implement Level 3 BLAS functions based on domestic Sunway 1600 platform. To make it clear, we take GEMM as an example. For the implementation on single-core, we apply many tuning techniques related to the specific platform, such as multiply-add instructions, loop unrolling, software pipelining and instruction rearrangement, SIMD operations, and register blocking to push up the performance. For the multi-core implementation, we propose an efficient multi-threaded method. Compared with Goto BLAS, one of the famous open-source BLAS, the experiments show that our serial single-threaded method achieves a speedup of 4.72. What’s more, the average speedup of 4-threaded execution towards the single-threaded one can also reach 3.02.
出处 《计算机系统应用》 2016年第12期234-239,共6页 Computer Systems & Applications
基金 国家自然科学基金(91530103 91530323)
关键词 申威1600 三级BLAS GEMM 高性能计算 多核 Sunway 1600 level 3 BLAS GEMM HPC multi-core
  • 相关文献

参考文献5

二级参考文献72

  • 1Gustavson F G. High-performance Linear Algebra Algorithms Using New Generalized Data Structures for Matrices[J]. IBM J. RES. & DEV., 2003, 47(1).
  • 2Goto K. Anatomy of High-Performance Matrix Multiplication[J]. ACM Transactions on Mathematical Software, 2007, 34(3): 1-24.
  • 3蒋孟奇,张云泉,宋刚,等.综合递归分块技术及其在数值计算中的应用[C].全国高性能计算学术年会会议论文集.中国,北京[出版社不祥],2006.
  • 4Robert A. van de Geijn Enrique S. Quintana-Ort' I. The Science of Programming Matrix Computations[M]. [S. l.]: MIT Press, 2006.
  • 5Herrero J R, Navarro J J Building Libraries for Small Matrix Kemels[EB/OL]. (2007-02-20). www.citeseer.ist.psu.edu/703531. html.
  • 6Kelley C T. Iterative Methods for Linear and Nonlinear Equations [M]. Philadelphia, PA: SIAM, 1995.
  • 7The MathWorks Inc. MATLAB and statistics toolbox release 2012b [OL]. 2012. [2013-03-26]. http://www. mathworks, corn/products/statistics/.
  • 8Argonne National Laboratory. PETSc: Portable, extensible toolkit for scientific computation [OL]. 2013. [2013-03-26]. http ://www. mcs. anl. gov/petsc/.
  • 9Sandia National Laboratories. The Trilinos project [OL]. 2013. [-2013-03-26]. http://trilinos, sandia, gov.
  • 10Falgout R D, Yang U M. Hypre: A library of high performance preconditioners [C] //Proc of the 2nd Int Conf on Computational Science. New York: ACM, 2002:632-641.

共引文献28

同被引文献36

引证文献11

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部