期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
A high‑performance matrix transposition for a new MIMD architecture processor PEZY‑SC3s
1
作者 Yaling Liang Qinglin Wang +3 位作者 Shun Yang Rui Xia Weihao Guo Jie Liu 《CCF Transactions on High Performance Computing》 2025年第4期323-335,共13页
Matrix transposition is a vital kernel widely used in various fields.However,its memory-intensive nature leads to significant memory access conflicts,making it a performance bottleneck.Therefore,optimizing matrix tran... Matrix transposition is a vital kernel widely used in various fields.However,its memory-intensive nature leads to significant memory access conflicts,making it a performance bottleneck.Therefore,optimizing matrix transposition algorithms based on architectural features is crucial for improving the performance of related applications and enhancing system resource utilization.The PEZY-SC3s,a new MIMD(Multiple Instruction Multiple Data)architecture processor,possesses numerous cores and supports SIMD instructions,demonstrating tremendous potential for high-performance computing.However,no matrix transposition algorithm currently exists tailored to the PEZY-SC3s architecture to leverage its computing potential fully.We propose a high-performance matrix transposition algorithm for PEZY-SC3s.First,we block the matrix according to the cache architecture at the microkernel level to improve the memory access pattern.Then,we separate read and write operations by utilizing the PEZY-SC3s’Local Memory,solving the cache line contention.Finally,we design various processor-level parallel strategies and implement a dynamic selection strategy based on a performance heuristic algorithm for different matrix shapes,alleviating bank conflict and enhancing performance.Experimental results show that our implementation achieves an average speedup of 17.27 times across 60 matrices compared to the baseline algorithm,with a maximum bandwidth utilization of 87.7%. 展开更多
关键词 matrix transposition MIMD Parallel algorithm PEZY-SC3s
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部