期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
ArkGPU:enabling applications’high-goodput co-location execution on multitasking GPUs
1
作者 Jie Lou Yiming Sun +3 位作者 Jie Zhang Huawei Cao Yuan Zhang Ninghui Sun 《CCF Transactions on High Performance Computing》 2023年第3期304-321,共18页
With the development of deep learning,hardware accelerators represented by GPUs have been used to accelerate the execution of deep learning applications.A key problem in GPU cluster is how to schedule various deep lea... With the development of deep learning,hardware accelerators represented by GPUs have been used to accelerate the execution of deep learning applications.A key problem in GPU cluster is how to schedule various deep learning applications,including training applications and latency-critical inference applications,to achieve optimal system performance.In cloud datacenters,inference applications often require fewer resources,and the exclusive GPU execution of one inference application can result in a significant waste of GPU resources.Existing work mainly focuses on the co-location execution of multiple inference applications in datacenters using MPS(Multi-Process Service).There are several problems with this execution pattern,datacenters may be in low-workload state for long periods of time due to the diurnal pattern of inference applications,MPS-based data sharing can lead to interaction errors between contexts,and resource contention may cause Quality of Service(QoS)violations.To solve above problems,we propose ArkGPU,a runtime system that dynamically allocates resources.ArkGPU can improve the resource utilization of the cluster,while guaranteeing the QoS of inference applications.ArkGPU is comprised of a performance predictor,a scheduler,a resource limiter,and an adjustment unit.We conduct extensive experiments on the NVIDIA V100 GPU to verify the effectiveness of ArkGPU.We achieve High-Goodput for latency-critical applications which have an average throughput increase of 584.27%compared to MPS.We deploy multiple applications simultaneously on ArkGPU,and in this case,goodput is improved by 94.98%compared to k8s-native and 38.65%compared to MPS. 展开更多
关键词 GOODPUT gpu sharing CO-LOCATION Latency critical jobs scheduling QoS guarantee
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部