In limited feedback-based CloudRAN(C-RAN) systems,the inter-cluster and intra-cluster interference together with the quantification error can seriously deteriorates the system spectral efficiency.We,in this paper,prop...In limited feedback-based CloudRAN(C-RAN) systems,the inter-cluster and intra-cluster interference together with the quantification error can seriously deteriorates the system spectral efficiency.We,in this paper,propose an efficient three-phase framework and corresponding algorithms for dealing with this problem.Firstly,a greedy scheduling algorithm based on the lower bound of the ergodic rate is performed for generating an elementary cluster in the first phase.And then the elementary cluster is divided into many small clusters according to the following proposed algorithms based on the short term instantaneous information in the second phase.In the end,based on the limited feedback two zero-forcing(ZF) precoding strategies are adopted for reducing the intra-cluster interference in the third phase.The provided Monte Carlo simulations show the effectiveness of our proposed algorithms in the respect of system spectral efficiency and average user rate.展开更多
A checkpointing scheme for relevant distributed real-time tasks which can be scheduled as a DAG is proposed. A typical algorithm, OSA, is selected for DAG scheduling. A new methods based a new structure, Scheduled Clu...A checkpointing scheme for relevant distributed real-time tasks which can be scheduled as a DAG is proposed. A typical algorithm, OSA, is selected for DAG scheduling. A new methods based a new structure, Scheduled Cluster Tree, is presented to calculate the slack time of each task in the task cluster. In the checkpointing scheme, the optimal checkpoint intervals which minimize the approximated failure probability are derived formally and validated experimentally. The complexity of approximated failure probability is quite small compared with that of the exact probability. Meanwhile, the consistency of the checkpointing is discussed also.展开更多
The escalating demand on batched deep learning inference requires concurrent deployment of multiple deep neural network(DNN)models on a shared accelerator,thereby enabling spatial multiplexing to enhance resource util...The escalating demand on batched deep learning inference requires concurrent deployment of multiple deep neural network(DNN)models on a shared accelerator,thereby enabling spatial multiplexing to enhance resource utilization.Spatial multiplexing for co-locating multiple model services on the same accelerator increases the complexity of scheduling within a cluster.The meticulous collaborative optimization of model co-location combinations and resource allocation in a cluster creates an extensive configuration space for scheduling.In this paper,we present,a highthroughput inference system that schedules batch-oriented and heterogeneous requests on spatial multiplexing-enabled computing clusters.determines optimal scheduling configurations by jointly optimizing model co-location and resource allocation using reinforcement learning to solve this combinatorial optimization problem.The experimental results demonstrate that on a large-scale cluster comprising 250 machine nodes with 1000 neural processing units(NPUs),achieves average performance improvements of 2.2x,1.3x,and 1.2x compared with the baseline systems,respectively.Furthermore,is optimized and evaluated on mainstream GPUs.The results demonstrate that achieves average throughput improvements of 2.7x on the NVIDIA A100 GPU and 1.9x on the AMD MI100 GPU.展开更多
基金supported by the National Natural Science Foundation of China(NSFC) under Grant(No. 61461136001)
文摘In limited feedback-based CloudRAN(C-RAN) systems,the inter-cluster and intra-cluster interference together with the quantification error can seriously deteriorates the system spectral efficiency.We,in this paper,propose an efficient three-phase framework and corresponding algorithms for dealing with this problem.Firstly,a greedy scheduling algorithm based on the lower bound of the ergodic rate is performed for generating an elementary cluster in the first phase.And then the elementary cluster is divided into many small clusters according to the following proposed algorithms based on the short term instantaneous information in the second phase.In the end,based on the limited feedback two zero-forcing(ZF) precoding strategies are adopted for reducing the intra-cluster interference in the third phase.The provided Monte Carlo simulations show the effectiveness of our proposed algorithms in the respect of system spectral efficiency and average user rate.
文摘A checkpointing scheme for relevant distributed real-time tasks which can be scheduled as a DAG is proposed. A typical algorithm, OSA, is selected for DAG scheduling. A new methods based a new structure, Scheduled Cluster Tree, is presented to calculate the slack time of each task in the task cluster. In the checkpointing scheme, the optimal checkpoint intervals which minimize the approximated failure probability are derived formally and validated experimentally. The complexity of approximated failure probability is quite small compared with that of the exact probability. Meanwhile, the consistency of the checkpointing is discussed also.
基金supported by the National Key Research and Development Program of China under Grant No.2021YFB0300202the National Natural Science Foundation of China under Grant Nos.62032023,T2125013 and 62102396+3 种基金the Beijing Nova Program under Grant No.Z211100002121143the Youth Innovation Promotion Association of Chinese Academy of Sciences under Grant No.2021099the Innovation Funding of Institute of Computing Technology,Chinese Academy of Sciences under Grant No.E461030Tianjin Science and Technology Plan Project under Grant No.24ZXKJGX00060.
文摘The escalating demand on batched deep learning inference requires concurrent deployment of multiple deep neural network(DNN)models on a shared accelerator,thereby enabling spatial multiplexing to enhance resource utilization.Spatial multiplexing for co-locating multiple model services on the same accelerator increases the complexity of scheduling within a cluster.The meticulous collaborative optimization of model co-location combinations and resource allocation in a cluster creates an extensive configuration space for scheduling.In this paper,we present,a highthroughput inference system that schedules batch-oriented and heterogeneous requests on spatial multiplexing-enabled computing clusters.determines optimal scheduling configurations by jointly optimizing model co-location and resource allocation using reinforcement learning to solve this combinatorial optimization problem.The experimental results demonstrate that on a large-scale cluster comprising 250 machine nodes with 1000 neural processing units(NPUs),achieves average performance improvements of 2.2x,1.3x,and 1.2x compared with the baseline systems,respectively.Furthermore,is optimized and evaluated on mainstream GPUs.The results demonstrate that achieves average throughput improvements of 2.7x on the NVIDIA A100 GPU and 1.9x on the AMD MI100 GPU.