期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
xMath2.0:a high‑performance extended math library for SW26010‑Pro many‑core processor
1
作者 Fangfang Liu Wenjing Ma +11 位作者 Yuwen Zhao Daokun Chen Yi Hu Qinglin Lu WanWang Yin Xinhui Yuan Lijuan Jiang Hao Yan Min Li Hongsen Wang Xinyu Wang Chao Yang 《CCF Transactions on High Performance Computing》 2023年第1期56-71,共16页
High performance extended math library is used by many scientific engineering and artificial intelligence applications,which usually involves many common mathematical computations and the most time-consuming functions... High performance extended math library is used by many scientific engineering and artificial intelligence applications,which usually involves many common mathematical computations and the most time-consuming functions.In order to take full advantage of the high performance processors,these functions need to be parallelized and optimized intensively.It is common for processor vendors to supply highly optimized commercial math library.For example,Intel maintains oneMKL,and NVIDIA has cuBLAS,cuSolver,and cuFFT.In this paper,we release a new-generation high-performance extended math library,xMath 2.0,specifically designed for the SW26010-Pro many-core processor,which includes four major modules:BLAS,LAPACK,FFT,and SPARSE.Each module is optimized for the domestic SW26010-Pro processor,leveraging parallelization on the many-core CPE mesh and optimization techniques such as assembly instruction rearrangement and computation-communication overlapping.In xMath2.0,the BLAS module has an average performance increase of 146.02 times over the MPE version of GotoBLAS2,and the performance of BLAS level 3 functions has increased by 393.95 times.The LAPACK module(calling xMath BLAS)is 233.44 times better than LAPACK(calling GotoBLAS2).And the FFT module is 47.63 times faster than FFTW3.3.2.The library has been deployed on the domestic Sunway TaihuLight Pro supercomputer,which have been used by dozens of users. 展开更多
关键词 Extended math library SW26010-Pro Sunway TaihuLight Pro BLAS LAPACK FFT SPARSE Many-Core Processors
在线阅读 下载PDF
A Quantitative Evaluation of Vector Transcendental Functions on ARMv8-Based Processors
2
作者 沈洁 龙标 黄春 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第3期686-701,共16页
Transcendental functions are important functions in various high performance computing applications.Because these functions are time-consuming and the vector units on modern processors become wider and more scalable,t... Transcendental functions are important functions in various high performance computing applications.Because these functions are time-consuming and the vector units on modern processors become wider and more scalable,there is an increasing demand for developing and using vector transcendental functions in such performance-hungry applications.However,the performance of vector transcendental functions as well as their accuracy remain largely unexplored.To address this issue,we perform a comprehensive evaluation of two Single Instruction Multiple Data(SIMD)intrinsics based vector math libraries on two ARMv8 compatible processors.We first design dedicated microbenchmarks that help us understand the performance behavior of vector transcendental functions.Then,we propose a piecewise,quantitative evaluation method with a set of meaningful metrics to quantify their performance and accuracy.By analyzing the experimental results,we find that vector transcendental functions achieve good performance speedups thanks to the vectorization and algorithm optimization.Moreover,vector math libraries can replace scalar math libraries in many cases because of improved performance and satisfactory accuracy.Despite this,the implementations of vector math libraries are still immature,which means further optimization is needed,and our evaluation reveals feasible optimization solutions for future vector math libraries. 展开更多
关键词 transcendental function vector math library piecewise quantitative evaluation microbenchmarking ARMv8-based processor
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部