期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Optimizing sparse-dense matrix-matrix multiplication for DCUs
1
作者 Hengliang Guo Yubo Han +5 位作者 Haolei Wang Shengguang Zhu Gang Wu Yang Guo Xiangdong Liu Chuanqiang Li 《CCF Transactions on High Performance Computing》 2026年第1期49-60,共12页
To address the issues of sparse matrix load imbalance and parallelism degradation with increasing matrix size in the mainstream Sparse-dense matrix-matrix multiplication(SpMM)parallelization strategy row-split,we prop... To address the issues of sparse matrix load imbalance and parallelism degradation with increasing matrix size in the mainstream Sparse-dense matrix-matrix multiplication(SpMM)parallelization strategy row-split,we propose a new framework for parallel SpMM computation on DCUs(GPU-like accelerators).This framework is based on the standard CSR format,requiring no additional format conversion,and thus offers strong generality.To address the issue of load imbalance,we introduce a coarse-grained two-level binning strategy that categorizes the rows of the sparse matrix into three groups based on the number of non-zero elements.Dedicated computation kernels are designed for each category to better accommodate different types of computational tasks,thereby significantly improving load balance.To address the decline in parallelism as the matrix size increases,we design multiple optimized kernels and dynamically select the optimal configuration at runtime to maximize parallelism.Experimental results show that our proposed SpMM framework significantly outperforms two current state-of-the-art row-split based SpMM algorithms(rocSparse and GE-SpMM),achieving speedups of 5.4×and 2.28×,respectively. 展开更多
关键词 SpMM Sparse matrix Load balancing dcu accelerator
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部