移动边缘计算将中心云的计算、存储等能力一部分下放到网络边缘,使得终端设备产生的大量数据可以在网络边缘被快速处理,但近距离带来的一个关键问题是,如何确保用户在不同地点移动时始终能获得良好的性能。为解决此问题,提出一种基于Doc...移动边缘计算将中心云的计算、存储等能力一部分下放到网络边缘,使得终端设备产生的大量数据可以在网络边缘被快速处理,但近距离带来的一个关键问题是,如何确保用户在不同地点移动时始终能获得良好的性能。为解决此问题,提出一种基于Docker的快速服务迁移方法,利用增量同步迁移镜像层和基于检查点/恢复技术(Checkpoint/Restore In Userspace,CRIU)迁移容器层,实现状态信息迁移来改善终端用户的体验。实验结果表明,该方案能够进行实时的服务迁移,有效地降低服务迁移的总时间和服务的停机时间,从而保证了服务的连续性。展开更多
With supercomputing and intelligent computing convergence,the Supercomputer Internet is proposed to build,deploy,and run convergence applications using cloud-native technologies.Message Passing Interface(MPI)is a repr...With supercomputing and intelligent computing convergence,the Supercomputer Internet is proposed to build,deploy,and run convergence applications using cloud-native technologies.Message Passing Interface(MPI)is a representative class of supercomputing applications in parallel computing environments.Live migration is the process of transferring a running application to a different physical location with minimal downtime that enables a number of useful application management capabilities such as load balancing,resource consolidation,and fault tolerance.While several works have been studying live migration for MPI workloads,most require modifying the operating system kernel,which hinders its broader adoption in data centers.This paper uses container technology and the CRIU tool to implement checkpointing and restarting a single container in MPI containerized environments,while ensuring the continuous execution of the MPI program.The paper has validated the feasibility of live migration for MPI workloads by testing with NAS Parallel Benchmarks(NPB),LAMMPS,and GROMACS.The paper discusses the impact of migration on MPI timing functions and proposes solutions.The paper observes a slight improvement in MPI computational performance due to migration,while also noting an increase in communication latency during the iterative process.展开更多
文摘移动边缘计算将中心云的计算、存储等能力一部分下放到网络边缘,使得终端设备产生的大量数据可以在网络边缘被快速处理,但近距离带来的一个关键问题是,如何确保用户在不同地点移动时始终能获得良好的性能。为解决此问题,提出一种基于Docker的快速服务迁移方法,利用增量同步迁移镜像层和基于检查点/恢复技术(Checkpoint/Restore In Userspace,CRIU)迁移容器层,实现状态信息迁移来改善终端用户的体验。实验结果表明,该方案能够进行实时的服务迁移,有效地降低服务迁移的总时间和服务的停机时间,从而保证了服务的连续性。
基金supported by the National Key R&D Program of China Grant 2023YFB3002204。
文摘With supercomputing and intelligent computing convergence,the Supercomputer Internet is proposed to build,deploy,and run convergence applications using cloud-native technologies.Message Passing Interface(MPI)is a representative class of supercomputing applications in parallel computing environments.Live migration is the process of transferring a running application to a different physical location with minimal downtime that enables a number of useful application management capabilities such as load balancing,resource consolidation,and fault tolerance.While several works have been studying live migration for MPI workloads,most require modifying the operating system kernel,which hinders its broader adoption in data centers.This paper uses container technology and the CRIU tool to implement checkpointing and restarting a single container in MPI containerized environments,while ensuring the continuous execution of the MPI program.The paper has validated the feasibility of live migration for MPI workloads by testing with NAS Parallel Benchmarks(NPB),LAMMPS,and GROMACS.The paper discusses the impact of migration on MPI timing functions and proposes solutions.The paper observes a slight improvement in MPI computational performance due to migration,while also noting an increase in communication latency during the iterative process.