The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co...The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents.展开更多
With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for ...With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for communication and computation to build air-ground integrated networks(AGINs)offers a promising solution for seamless network coverage of remote internet of things(IoT)devices in the future.To address the performance demands of future mobile devices(MDs),we proposed an MEC-assisted AGIN system.The goal is to minimize the long-term computational overhead of MDs by jointly optimizing transmission power,flight trajecto-ries,resource allocation,and offloading ratios,while utilizing non-orthogonal multiple access(NOMA)to improve device connectivity of large-scale MDs and spectral efficiency.We first designed an adaptive clustering scheme based on K-Means to cluster MDs and established commu-nication links,improving efficiency and load balancing.Then,considering system dynamics,we introduced a partial computation offloading algorithm based on multi-agent deep deterministic pol-icy gradient(MADDPG),modeling the multi-UAV computation offloading problem as a Markov decision process(MDP).This algorithm optimizes resource allocation through centralized training and distributed execution,reducing computational overhead.Simulation results show that the pro-posed algorithm not only converges stably but also outperforms other benchmark algorithms in han-dling complex scenarios with multiple devices.展开更多
针对柔性直流输电系统(voltage source converter based high voltage direct current transmission,VSC-HVDC)控制参数设计过程中存在的鲁棒性差、依赖已知电路参数、工程设计经验化等问题,提出一种基于马尔科夫转换场(Markov transiti...针对柔性直流输电系统(voltage source converter based high voltage direct current transmission,VSC-HVDC)控制参数设计过程中存在的鲁棒性差、依赖已知电路参数、工程设计经验化等问题,提出一种基于马尔科夫转换场(Markov transition field,MTF)与深度确定性策略梯度算法(deep deterministic policy gradient,DDPG)结合的鲁棒性强、不依赖电路参数特性以及可视化的VSC-HVDC控制参数优化设计方法。首先,采用马尔科夫转换场将电路功率、电压等一维时序波形数据转换为二维马尔科夫转换场域图像并使用马尔科夫转换场损失函数(Markov transition field loss,MTFL)判断二维转换域图的数据波动性;其次,将MTFL损失函数与DDPG算法相结合,综合利用MTFL损失函数对系统输出时序数据动态特性评价能力更强的优点和DDPG算法泛化性能优秀的特点,实现VSC-HVDC系统控制参数优化;最后,通过MATLAB模拟和实验结果验证该方法的有效性。展开更多
为提高多无人船编队系统的导航能力,提出了一种基于注意力机制的多智能体深度确定性策略梯度(ATMADDPG:Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient)算法。该算法在训练阶段,通过大量试验训练出最佳策略...为提高多无人船编队系统的导航能力,提出了一种基于注意力机制的多智能体深度确定性策略梯度(ATMADDPG:Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient)算法。该算法在训练阶段,通过大量试验训练出最佳策略,并在实验阶段直接使用训练出的最佳策略得到最佳编队路径。仿真实验将4艘相同的“百川号”无人船作为实验对象。实验结果表明,基于ATMADDPG算法的队形保持策略能实现稳定的多无人船编队导航,并在一定程度上满足队形保持的要求。相较于多智能体深度确定性策略梯度(MADDPG:Multi-Agent Depth Deterministic Policy Gradient)算法,所提出的ATMADDPG算法在收敛速度、队形保持能力和对环境变化的适应性等方面表现出更优越的性能,综合导航效率可提高约80%,具有较大的应用潜力。展开更多
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ...The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.展开更多
基金supported by The National Key R&D Program of China(2020YFB0905900):Research on artificial intelligence application of power internet of things.
文摘The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents.
基金supported by the Gansu Province Key Research and Development Plan(No.23YFGA0062)Gansu Provin-cial Innovation Fund(No.2022A-215).
文摘With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for communication and computation to build air-ground integrated networks(AGINs)offers a promising solution for seamless network coverage of remote internet of things(IoT)devices in the future.To address the performance demands of future mobile devices(MDs),we proposed an MEC-assisted AGIN system.The goal is to minimize the long-term computational overhead of MDs by jointly optimizing transmission power,flight trajecto-ries,resource allocation,and offloading ratios,while utilizing non-orthogonal multiple access(NOMA)to improve device connectivity of large-scale MDs and spectral efficiency.We first designed an adaptive clustering scheme based on K-Means to cluster MDs and established commu-nication links,improving efficiency and load balancing.Then,considering system dynamics,we introduced a partial computation offloading algorithm based on multi-agent deep deterministic pol-icy gradient(MADDPG),modeling the multi-UAV computation offloading problem as a Markov decision process(MDP).This algorithm optimizes resource allocation through centralized training and distributed execution,reducing computational overhead.Simulation results show that the pro-posed algorithm not only converges stably but also outperforms other benchmark algorithms in han-dling complex scenarios with multiple devices.
文摘针对柔性直流输电系统(voltage source converter based high voltage direct current transmission,VSC-HVDC)控制参数设计过程中存在的鲁棒性差、依赖已知电路参数、工程设计经验化等问题,提出一种基于马尔科夫转换场(Markov transition field,MTF)与深度确定性策略梯度算法(deep deterministic policy gradient,DDPG)结合的鲁棒性强、不依赖电路参数特性以及可视化的VSC-HVDC控制参数优化设计方法。首先,采用马尔科夫转换场将电路功率、电压等一维时序波形数据转换为二维马尔科夫转换场域图像并使用马尔科夫转换场损失函数(Markov transition field loss,MTFL)判断二维转换域图的数据波动性;其次,将MTFL损失函数与DDPG算法相结合,综合利用MTFL损失函数对系统输出时序数据动态特性评价能力更强的优点和DDPG算法泛化性能优秀的特点,实现VSC-HVDC系统控制参数优化;最后,通过MATLAB模拟和实验结果验证该方法的有效性。
基金supported by the Key Research and Development Program of Shaanxi(2022GY-089)the Natural Science Basic Research Program of Shaanxi(2022JQ-593).
文摘The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.