期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Enhancing LLM Inference Performance on ARM CPUs Through Software and Hardware Co-Optimization Strategies
1
作者 CHENG ZHANG XINGYU ZHU +8 位作者 LONGHAO CHEN TINGJIE YANG EVENS PAN GUOSHENG YU YANG ZHAO XIGUANG WU BO LI WEI MAO GENQUAN HAN 《Integrated Circuits and Systems》 2025年第2期49-57,共9页
Large language models(LLMs)have exhibited remarkable performance across a broad spectrum of tasks,yet their extensive computational and memory requirements present substantial challenges for deployment in resource-con... Large language models(LLMs)have exhibited remarkable performance across a broad spectrum of tasks,yet their extensive computational and memory requirements present substantial challenges for deployment in resource-constrained scenarios.To address the challenges,this work introduces software and hardware co-optimization strategies aimed at enhancing the inference performance of LLMs on ARM CPU-based platforms.A mixed-precision quantization technique is employed,preserving the precision of critical weights to maintain model accuracy while quantizing non-essential weights to INT8,thereby reducing the model’s memory footprint.This work also capitalizes on the SIMD instruction set of ARM CPUs to efficiently process model data.Furthermore,the inference framework is optimized by fusing components of the attention computation and streamlining the dequantization process through modifications to the scaling factor.These enhancements result in a significant reduction in model memory usage and improved throughput during the prefill and decode stages.The efficacy of the proposed approach is demonstrated through the optimization of the Qwen-1.8B model on Armv9,with only a 0.66%decrease in accuracy and a reduction in memory usage to 58.8%of the baseline,while achieving a 4.09×and 15.23×increase in inference performance for the prefill and decode stages over the baseline,respectively. 展开更多
关键词 Model compression mixed-precision quantization ARM CPUs SIMD optimization LLM inference performance.
在线阅读 下载PDF
DeepSeek:Toward Global Education Empowerment for the Whole Society 被引量:1
2
作者 Fei Wu 《Frontiers of Digital Education》 2025年第2期3-3,共1页
DeepSeek has recently gained widespread attention for its impressive inference performance,cost-effectiveness,and open-source advantages.DeepSeek is rewriting the rules of AIempowering AI directly into a whole society... DeepSeek has recently gained widespread attention for its impressive inference performance,cost-effectiveness,and open-source advantages.DeepSeek is rewriting the rules of AIempowering AI directly into a whole society.As DeepSeek continues to grow,it is proving that the future of AI does not belong to the fewit belongs to everyone,and that is a game-changer. 展开更多
关键词 cost effectiveness game changer open source advantages global education empowerment ai inference performance ai society rewriting rules aiempowering
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部