面向LLM本地化部署的并发性能协同评估研究

Research on Collaborative Concurrency Performance Evaluation for On-Premise LLM Deployment

下载PDF

导出

摘要针对大语言模型(LLM)本地化部署的评估流程非标、性能与资源监控脱节等挑战,本文研制一套ContainTest-AI的一体化综合评估平台。首先,本文通过异步并发引擎、GPU同步监控及容器化技术部署评估平台;其次,重点对Qwen3-30B-A3B-Instruct模型开展实证测试,成功定位128并发场景下的性能拐点,此时系统字符吞吐量饱和在1935.79 token/s,GPU功耗达256 W,请求成功率首次从100%降至97.9%,直观揭示了应用性能与硬件资源的瓶颈关联,并确认该点为峰值能效的最佳工作点;最后,平台通过同量级不同模型的性能测试,实现了可量化的横向对比评估。本研究为LLM本地化部署提供可靠的量化决策依据,为AI大模型的选型与计算资源的规划提供有效支持。 The on-premise deployment of Large Language Models(LLMs)faces critical challenges,including non-standardized performance evaluation,decoupled application and resource monitoring.To address these issues,this paper presents ContainTest-AI,an integrated evaluation platform.First,this paper deploys the evaluation platform using an asynchronous concurrency engine,synchronized GPU monitoring,and containerized technology.Second,this paper conducts empirical evaluation on the Qwen3-30BA3B-Instruct model.The experiments demonstrate the platform's capability to precisely identify the performance inflection point at 128 concurrent users.At this threshold,the system's character throughput saturated at 1935.79 token/s,GPU power consumption reached 256W,and the request success rate first dropped from 100% to 97.9%.These findings intuitively reveal the bottleneck correlation between application performance and hardware resources,and confirm this point as the optimal operating point for peak energy efficiency.Finally,the platform achieves quantifiable horizontal comparative evaluation through performance tests on different models of the same scale.This study provides reliable quantitative decision basis for on-premise LLM deployment,offering effective support for model selection and computational resource planning.

作者周诗翔洪靖直李祎琳张梁斌 Zhou Shixiang;Hong Jingzhi;Li Yilin;Zhang Liangbin(College of Big Data and Software Engineering,Zhejiang Wanli University,Ningbo,Zhejiang 315100,China)

机构地区浙江万里学院大数据与软件工程学院

出处《计算机时代》 2026年第3期90-95,共6页 Computer Era

基金浙江省新苗人才科技创新项目(2025R420A010) 国家级大学生创新训练项目(S202510876004)。

关键词 LLM性能测试并发测试 GPU协同监控性能拐点容器化评估 LLM Performance Testing Concurrency Testing GPU Synchronized Monitoring Performance Inflection Point Containerized Evaluation

分类号 TP311.5 [自动化与计算机技术—计算机软件与理论]