For the mobile edge computing network consisting of multiple base stations and resourceconstrained user devices,network cost in terms of energy and delay will incur during task offloading from the user to the edge ser...For the mobile edge computing network consisting of multiple base stations and resourceconstrained user devices,network cost in terms of energy and delay will incur during task offloading from the user to the edge server.With the limitations imposed on transmission capacity,computing resource,and connection capacity,the per-slot online learning algorithm is first proposed to minimize the time-averaged network cost.In particular,by leveraging the theories of stochastic gradient descent and minimum cost maximum flow,the user association is jointly optimized with resource scheduling in each time slot.The theoretical analysis proves that the proposed approach can achieve asymptotic optimality without any prior knowledge of the network environment.Moreover,to alleviate the high network overhead incurred during user handover and task migration,a two-timescale optimization approach is proposed to avoid frequent changes in user association.With user association executed on a large timescale and the resource scheduling decided on the single time slot,the asymptotic optimality is preserved.Simulation results verify the effectiveness of the proposed online learning algorithms.展开更多
In volt/var control(VVC)for active distribution networks,it is essential to integrate traditional voltage regulation devices with modern smart photovoltaic inverters to prevent voltage violations.However,model-based m...In volt/var control(VVC)for active distribution networks,it is essential to integrate traditional voltage regulation devices with modern smart photovoltaic inverters to prevent voltage violations.However,model-based multi-device VVC methods rely on accurate system models for decision-making,which can be challenging due to the extensive modeling workload.To tackle the complexities of multi-device cooperation in VVC,this paper proposes a two-timescale VVC method based on reinforcement learning with hybrid action space,termed the hybrid action representation twin delayed deep deterministic policy gradient(HAR-TD3)method.This method simultaneously manages traditional discrete voltage regulation devices,which operate on a slower timescale,and smart continuous voltage regulation devices,which function on a faster timescale.To enable effective collaboration between the different action spaces of these devices,we propose a variational auto-encoder based hybrid action reconstruction network.This network captures the interdependencies of hybrid actions by embedding both discrete and continuous actions into the latent representation space and subsequently decoding them for action reconstruction.The proposed method is validated on IEEE 33-bus,69-bus,and 123-bus distribution networks.Numerical results indicate that the proposed method successfully coordinates discrete and continuous voltage regulation devices,achieving fewer voltage violations compared with stateof-the-art reinforcement learning methods.展开更多
This paper investigates mobility-aware online optimization for digital twin(DT)-assisted task execution in edge computing environments.In such systems,DTs,hosted on edge servers(ESs),require proactive migration to mai...This paper investigates mobility-aware online optimization for digital twin(DT)-assisted task execution in edge computing environments.In such systems,DTs,hosted on edge servers(ESs),require proactive migration to maintain proximity to their mobile physical twin(PT)counterparts.To minimize task response latency under a stringent energy consumption constraint,we jointly optimize three key components:the status data uploading frequency fromthe PT,theDT migration decisions,and the allocation of computational and communication resources.To address the asynchronous nature of these decisions,we propose a novel two-timescale mobility-aware online optimization(TMO)framework.The TMO scheme leverages an extended two-timescale Lyapunov optimization framework to decompose the long-term problem into sequential subproblems.At the larger timescale,a multi-armed bandit(MAB)algorithm is employed to dynamically learn the optimal status data uploading frequency.Within each shorter timescale,we first employ a gated recurrent unit(GRU)-based predictor to forecast the PT’s trajectory.Based on this prediction,an alternate minimization(AM)algorithm is then utilized to solve for the DT migration and resource allocation variables.Theoretical analysis confirms that the proposed TMO scheme is asymptotically optimal.Furthermore,simulation results demonstrate its significant performance gains over existing benchmark methods.展开更多
This study proposes a two-timescale transmission scheme for extremely large-scale reconfigurable intelligent surface aided(XL-RIS-aided)massive multi-input multi-output(MIMO)systems in the presence of visibility regio...This study proposes a two-timescale transmission scheme for extremely large-scale reconfigurable intelligent surface aided(XL-RIS-aided)massive multi-input multi-output(MIMO)systems in the presence of visibility regions(VRs).The beamforming of base stations(BSs)is designed based on rapidly changing instantaneous channel state information(CSI),while the phase shifts of RIS are configured based on slowly varying statistical CSI.Specifically,we first formulate a system model with spatially correlated Rician fading channels and introduce the concept of VRs.Then,we derive a closed-form approximate expression for the achievable rate and analyze the impact of VRs on system performance and computational complexity.Then,we solve the problem of maximizing the minimum user rate by optimizing the phase shifts of RIS through an algorithm based on accelerated gradient ascent.Finally,we present numerical results to validate the performance of the considered system from different aspects and reveal the low system complexity of deploying XL-RIS in massive MIMO systems with the help of VRs.展开更多
基金the National Natural Science Foundation of China(61971066,61941114)the Beijing Natural Science Foundation(No.L182038)National Youth Top-notch Talent Support Program.
文摘For the mobile edge computing network consisting of multiple base stations and resourceconstrained user devices,network cost in terms of energy and delay will incur during task offloading from the user to the edge server.With the limitations imposed on transmission capacity,computing resource,and connection capacity,the per-slot online learning algorithm is first proposed to minimize the time-averaged network cost.In particular,by leveraging the theories of stochastic gradient descent and minimum cost maximum flow,the user association is jointly optimized with resource scheduling in each time slot.The theoretical analysis proves that the proposed approach can achieve asymptotic optimality without any prior knowledge of the network environment.Moreover,to alleviate the high network overhead incurred during user handover and task migration,a two-timescale optimization approach is proposed to avoid frequent changes in user association.With user association executed on a large timescale and the resource scheduling decided on the single time slot,the asymptotic optimality is preserved.Simulation results verify the effectiveness of the proposed online learning algorithms.
基金supported in part by the National Science and Technology Major Project(No.2022ZD0116900)the National Natural Science Foundation of China(No.52277118)the Natural Science Foundation of Tianjin(No.22JCZDJC00660).
文摘In volt/var control(VVC)for active distribution networks,it is essential to integrate traditional voltage regulation devices with modern smart photovoltaic inverters to prevent voltage violations.However,model-based multi-device VVC methods rely on accurate system models for decision-making,which can be challenging due to the extensive modeling workload.To tackle the complexities of multi-device cooperation in VVC,this paper proposes a two-timescale VVC method based on reinforcement learning with hybrid action space,termed the hybrid action representation twin delayed deep deterministic policy gradient(HAR-TD3)method.This method simultaneously manages traditional discrete voltage regulation devices,which operate on a slower timescale,and smart continuous voltage regulation devices,which function on a faster timescale.To enable effective collaboration between the different action spaces of these devices,we propose a variational auto-encoder based hybrid action reconstruction network.This network captures the interdependencies of hybrid actions by embedding both discrete and continuous actions into the latent representation space and subsequently decoding them for action reconstruction.The proposed method is validated on IEEE 33-bus,69-bus,and 123-bus distribution networks.Numerical results indicate that the proposed method successfully coordinates discrete and continuous voltage regulation devices,achieving fewer voltage violations compared with stateof-the-art reinforcement learning methods.
基金funded by the State Key Laboratory of Massive Personalized Customization System and Technology,grant No.H&C-MPC-2023-04-01.
文摘This paper investigates mobility-aware online optimization for digital twin(DT)-assisted task execution in edge computing environments.In such systems,DTs,hosted on edge servers(ESs),require proactive migration to maintain proximity to their mobile physical twin(PT)counterparts.To minimize task response latency under a stringent energy consumption constraint,we jointly optimize three key components:the status data uploading frequency fromthe PT,theDT migration decisions,and the allocation of computational and communication resources.To address the asynchronous nature of these decisions,we propose a novel two-timescale mobility-aware online optimization(TMO)framework.The TMO scheme leverages an extended two-timescale Lyapunov optimization framework to decompose the long-term problem into sequential subproblems.At the larger timescale,a multi-armed bandit(MAB)algorithm is employed to dynamically learn the optimal status data uploading frequency.Within each shorter timescale,we first employ a gated recurrent unit(GRU)-based predictor to forecast the PT’s trajectory.Based on this prediction,an alternate minimization(AM)algorithm is then utilized to solve for the DT migration and resource allocation variables.Theoretical analysis confirms that the proposed TMO scheme is asymptotically optimal.Furthermore,simulation results demonstrate its significant performance gains over existing benchmark methods.
文摘This study proposes a two-timescale transmission scheme for extremely large-scale reconfigurable intelligent surface aided(XL-RIS-aided)massive multi-input multi-output(MIMO)systems in the presence of visibility regions(VRs).The beamforming of base stations(BSs)is designed based on rapidly changing instantaneous channel state information(CSI),while the phase shifts of RIS are configured based on slowly varying statistical CSI.Specifically,we first formulate a system model with spatially correlated Rician fading channels and introduce the concept of VRs.Then,we derive a closed-form approximate expression for the achievable rate and analyze the impact of VRs on system performance and computational complexity.Then,we solve the problem of maximizing the minimum user rate by optimizing the phase shifts of RIS through an algorithm based on accelerated gradient ascent.Finally,we present numerical results to validate the performance of the considered system from different aspects and reveal the low system complexity of deploying XL-RIS in massive MIMO systems with the help of VRs.