期刊文献+

云边协同的深度学习作业调度方法

Cloud-edge Coordinated Scheduling Method for Deep Learning Jobs
在线阅读 下载PDF
导出
摘要 边缘服务器(edge server)为移动智能应用提供了低延时、高性能的服务.然而,由于边缘服务器上的负载量随时间波动较大,在负载较低的时刻,许多边缘服务器处于闲置状态,其计算资源并没有得到充分利用.与边缘服务器的利用率不同,随着人工智能技术在人们生活中的应用越来越广泛,云计算集群中的计算资源对于深度学习训练作业来说仍较为紧张.现有的集群调度策略不能有效利用云计算集群外的空闲计算资源,而有效利用云计算集群外的空闲计算资源可以缓解云计算集群的资源紧张问题,从而使得更多截止期敏感的深度学习训练作业在截止期之前完成.针对这一问题,设计一种面向截止期敏感的深度学习训练作业的集群调度策略,协同调度云计算资源和空闲的边缘计算资源,充分利用不同深度学习训练作业的性能特征和空闲的边缘服务器设备,使得更多的截止期敏感的深度学习训练作业在其截止期之前完成.最后,实验结果表明,云边协同的调度方法在提升作业的截止期满足率方面优于其他基线方法,并有效地利用空闲的边缘服务器设备,提高计算资源的利用率. Edge servers provide low-latency,high-performance services for mobile intelligent applications.However,due to significant fluctuations in the load on edge servers over time,many edge servers remain idle during periods of low load,and their computational resources are not fully utilized.In contrast to the underutilization of edge servers,computing resources in cloud computing clusters remain relatively scarce for deep learning training tasks as artificial intelligence becomes more widely applied in daily life.Existing cluster scheduling strategies fail to efficiently utilize idle computing resources outside of cloud computing clusters.Effectively utilizing these idle resources can alleviate the resource constraints in cloud computing clusters,thus enabling more deadline-sensitive deep learning training tasks to be completed before their deadlines.To address this issue,this study proposes a cluster scheduling strategy for deadline-sensitive deep learning training tasks,which coordinates the scheduling of cloud computing resources and idle edge computing resources.This strategy fully leverages the performance characteristics of different deep learning tasks and the availability of idle edge server devices,allowing more deadline-sensitive tasks to be completed on time.Simulation results demonstrate that the cloud-edge collaborative scheduling method outperforms other benchmark methods in improving the deadline satisfaction ratio and effectively utilizes idle edge server devices.
作者 谷典典 金鑫 刘譞哲 GU Dian-Dian;JIN Xin;LIU Xuan-Zhe(School of Computer Science,Peking University,Beijing 100871,China)
出处 《软件学报》 2025年第12期5480-5494,共15页 Journal of Software
基金 国家重点研发计划(2022YFB4500700) 国家杰出青年科学基金(62325201) 国家自然科学基金(62172008)。
关键词 云边协同 深度学习训练 集群调度 集群管理 截止期 cloud-edge coordination deep learning training cluster scheduling cluster management deadline

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部