期刊文献+

面向LLM本地化部署的并发性能协同评估研究

Research on Collaborative Concurrency Performance Evaluation for On-Premise LLM Deployment
在线阅读 下载PDF
导出
摘要 针对大语言模型(LLM)本地化部署的评估流程非标、性能与资源监控脱节等挑战,本文研制一套ContainTest-AI的一体化综合评估平台。首先,本文通过异步并发引擎、GPU同步监控及容器化技术部署评估平台;其次,重点对Qwen3-30B-A3B-Instruct模型开展实证测试,成功定位128并发场景下的性能拐点,此时系统字符吞吐量饱和在1935.79 token/s,GPU功耗达256 W,请求成功率首次从100%降至97.9%,直观揭示了应用性能与硬件资源的瓶颈关联,并确认该点为峰值能效的最佳工作点;最后,平台通过同量级不同模型的性能测试,实现了可量化的横向对比评估。本研究为LLM本地化部署提供可靠的量化决策依据,为AI大模型的选型与计算资源的规划提供有效支持。 The on-premise deployment of Large Language Models(LLMs)faces critical challenges,including non-standardized performance evaluation,decoupled application and resource monitoring.To address these issues,this paper presents ContainTest-AI,an integrated evaluation platform.First,this paper deploys the evaluation platform using an asynchronous concurrency engine,synchronized GPU monitoring,and containerized technology.Second,this paper conducts empirical evaluation on the Qwen3-30BA3B-Instruct model.The experiments demonstrate the platform's capability to precisely identify the performance inflection point at 128 concurrent users.At this threshold,the system's character throughput saturated at 1935.79 token/s,GPU power consumption reached 256W,and the request success rate first dropped from 100% to 97.9%.These findings intuitively reveal the bottleneck correlation between application performance and hardware resources,and confirm this point as the optimal operating point for peak energy efficiency.Finally,the platform achieves quantifiable horizontal comparative evaluation through performance tests on different models of the same scale.This study provides reliable quantitative decision basis for on-premise LLM deployment,offering effective support for model selection and computational resource planning.
作者 周诗翔 洪靖直 李祎琳 张梁斌 Zhou Shixiang;Hong Jingzhi;Li Yilin;Zhang Liangbin(College of Big Data and Software Engineering,Zhejiang Wanli University,Ningbo,Zhejiang 315100,China)
出处 《计算机时代》 2026年第3期90-95,共6页 Computer Era
基金 浙江省新苗人才科技创新项目(2025R420A010) 国家级大学生创新训练项目(S202510876004)。
关键词 LLM性能测试 并发测试 GPU协同监控 性能拐点 容器化评估 LLM Performance Testing Concurrency Testing GPU Synchronized Monitoring Performance Inflection Point Containerized Evaluation
  • 相关文献

参考文献5

二级参考文献19

共引文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部