期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Enhancing LLM Inference Performance on ARM CPUs Through Software and Hardware Co-Optimization Strategies
1
作者 CHENG ZHANG XINGYU ZHU +8 位作者 LONGHAO CHEN TINGJIE YANG EVENS PAN GUOSHENG YU YANG ZHAO XIGUANG WU BO LI WEI MAO GENQUAN HAN 《Integrated Circuits and Systems》 2025年第2期49-57,共9页
Large language models(LLMs)have exhibited remarkable performance across a broad spectrum of tasks,yet their extensive computational and memory requirements present substantial challenges for deployment in resource-con... Large language models(LLMs)have exhibited remarkable performance across a broad spectrum of tasks,yet their extensive computational and memory requirements present substantial challenges for deployment in resource-constrained scenarios.To address the challenges,this work introduces software and hardware co-optimization strategies aimed at enhancing the inference performance of LLMs on ARM CPU-based platforms.A mixed-precision quantization technique is employed,preserving the precision of critical weights to maintain model accuracy while quantizing non-essential weights to INT8,thereby reducing the model’s memory footprint.This work also capitalizes on the SIMD instruction set of ARM CPUs to efficiently process model data.Furthermore,the inference framework is optimized by fusing components of the attention computation and streamlining the dequantization process through modifications to the scaling factor.These enhancements result in a significant reduction in model memory usage and improved throughput during the prefill and decode stages.The efficacy of the proposed approach is demonstrated through the optimization of the Qwen-1.8B model on Armv9,with only a 0.66%decrease in accuracy and a reduction in memory usage to 58.8%of the baseline,while achieving a 4.09×and 15.23×increase in inference performance for the prefill and decode stages over the baseline,respectively. 展开更多
关键词 Model compression mixed-precision quantization ARM CPUs simd optimization LLM inference performance.
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部