期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Optimizing small matrix multiplications via batch grouping on multi-core DSPs
1
作者 Xiaotian Chen Pengyu Wang +2 位作者 Jianbin Fang Peng Zhang Chun Huang 《CCF Transactions on High Performance Computing》 2026年第1期22-36,共15页
General matrix multiplication is a vital operation in high-performance computing and has wide applications in areas such as computational fluid dynamics and deep learning (DL). While there are many optimization techni... General matrix multiplication is a vital operation in high-performance computing and has wide applications in areas such as computational fluid dynamics and deep learning (DL). While there are many optimization techniques available for large matrix multiplications on CPUs and GPUs, handling batches of small matrix operations requires innovative solutions. Digital Signal Processors (DSPs) offer a promising alternative for processing DL workloads;however, the architectural differences between DSPs and conventional processors like CPUs and GPUs necessitate the development of specialized optimization strategies. This paper introduces mtSmm , an optimization approach tailored for small matrix multiplications on multi-core DSPs. Our approach focuses on the batch-as-vector paradigm, efficient on-chip memory management, and a well-designed micro-kernel. By maximizing computational resources, optimizing instruction-level and thread-level parallelism, and enhanc-ing memory access patterns, our approach significantly improves performance. Experimental results on the FT-M7032 DSP demonstrate that our method can achieve up to 83% of the theoretical peak performance of the hardware, significantly outperforming current state-of-the-art methods for batches of small matrices. 展开更多
关键词 Small matrix multiplications gpdsps Performance model
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部