期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Leveraging Large-Scale Data for Efficient Low-Bit CUTLASS GEMM Optimization via Neural Networks
1
作者 Hong Guo Nianhui Guo +1 位作者 Christoph Meinel Haojin Yang 《Big Data Mining and Analytics》 2026年第2期632-652,共21页
Optimizing GEneral Matrix Multiplication(GEMM)on GPU platforms is becoming increasingly critical to meet the growing computational demands of modern deep neural network research.While significant progress has been mad... Optimizing GEneral Matrix Multiplication(GEMM)on GPU platforms is becoming increasingly critical to meet the growing computational demands of modern deep neural network research.While significant progress has been made in accelerating high-precision GEMM,the optimization of low-bit GEMM remains a challenging open problem.The CUTLASS library provides highly optimized low-bit GEMM templates leveraging Tensor Cores;however,performance varies considerably depending on tile and pipeline configurations across different GPU architectures.In this work,we propose a novel auto-tuning framework for low-bit CUTLASS GEMM,utilizing a neural network model to predict optimal GEMM template parameters for target GPUs.Our model is trained on a synthetic dataset with up to 116100 unique samples,encompassing diverse matrix sizes across various Ampere GPUs,and is thoroughly evaluated on these hardware platforms.Experimental results show that our method achieves an accuracy of up to 95.11%on the validation dataset.Furthermore,real-time evaluations of low-bit data types on the A100 GPU demonstrate speedups of up to 1.99×for GEMM operations and 1.28×for the linear layer,compared to the default CUTLASS templates. 展开更多
关键词 Low-bit GEneral Matrix Multiplication(GEMM) CUTLASS optimization neural network auto-tuning Tensor Cores tile and pipeline large-scale dataset
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部