This paper studies the problem of jamming decision-making for dynamic multiple communication links in wireless communication networks(WCNs).We propose a novel jamming channel allocation and power decision-making(JCAPD...This paper studies the problem of jamming decision-making for dynamic multiple communication links in wireless communication networks(WCNs).We propose a novel jamming channel allocation and power decision-making(JCAPD)approach based on multi-agent deep reinforcement learning(MADRL).In high-dynamic and multi-target aviation communication environments,the rapid changes in channels make it difficult for sensors to accurately capture instantaneous channel state information.This poses a challenge to make centralized jamming decisions with single-agent deep reinforcement learning(DRL)approaches.In response,we design a distributed multi-agent decision architecture(DMADA).We formulate multi-jammer resource allocation as a multiagent Markov decision process(MDP)and propose a fingerprint-based double deep Q-Network(FBDDQN)algorithm for solving it.Each jammer functions as an agent that interacts with the environment in this framework.Through the design of a reasonable reward and training mechanism,our approach enables jammers to achieve distributed cooperation,significantly improving the jamming success rate while considering jamming power cost,and reducing the transmission rate of links.Our experimental results show the FBDDQN algorithm is superior to the baseline methods.展开更多
Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies dri...Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies driven by Cybertwins have been proposed for adaptive task offloading strategies.However,the existence of random transmission delay between Cybertwin-driven agents and underlying networks is not considered in related works,which destroys the standard Markov property and increases the decision reaction time to reduce the task offloading strategy performance.In order to address this problem,we propose a pipelining task offloading method to lower the decision reaction time and model it as a delay-aware Markov Decision Process(MDP).Then,we design a delay-aware MADRL algorithm to minimize the weighted sum of task execution latency and energy consumption.Firstly,the state space is augmented using the lastly-received state and historical actions to rebuild the Markov property.Secondly,Gate Transformer-XL is introduced to capture historical actions'importance and maintain the consistent input dimension dynamically changed due to random transmission delays.Thirdly,a sampling method and a new loss function with the difference between the current and target state value and the difference between real state-action value and augmented state-action value are designed to obtain state transition trajectories close to the real ones.Numerical results demonstrate that the proposed methods are effective in reducing reaction time and improving the task offloading performance in the random-delay Cybertwin-enabled 6G networks.展开更多
在城市轨道交通列车通信系统中,车车(Train-to-Train,T2T)通信是以列车为中心的新一代列控系统通信模式。与传统的以地面控制设备为中心的车地(Train-to-Ground,T2G)通信模式相比,T2T能降低系统的复杂度以及通信时延,提升列车运行效率...在城市轨道交通列车通信系统中,车车(Train-to-Train,T2T)通信是以列车为中心的新一代列控系统通信模式。与传统的以地面控制设备为中心的车地(Train-to-Ground,T2G)通信模式相比,T2T能降低系统的复杂度以及通信时延,提升列车运行效率。但为保障列车运行的安全性,当前的城市轨道交通列车通信系统中,车车通信与车地通信是并存的。为解决车车通信与车地通信并存场景下,通信链路资源复用引起的干扰问题,论文基于深度强化学习算法,提出了一种智能频谱共享方法。该方法以车车通信链路作为智能体,将频谱共享建模为多智能体深度强化学习(Multi-Agent Deep Reinforcement Learning,MADRL)模型。同时,由于传统深度强化学习对经验池的依赖,为提高经验池的稳定性,引入了能表征智能体行动轨迹的低维指纹信息。在该方法中,多个智能体采用分布式协作的方式与列车所处的通信环境进行交互,以此来迭代优化神经网络参数,使智能体获得的累计奖励不断提升直至收敛。最后,利用训练好的深度强化学习模型,智能体能够联合选择最佳的通信频谱和传输功率。在Python环境下的仿真结果表明:相较于传统的深度强化学习算法,论文提出的算法不但能够使系统信道容量接近最大信道容量,而且能使数据传输的成功率保持在90%以上,极大地提升了列车运行的安全性。展开更多
With the rapid advancement of deep reinforcement learning(DRL)in multi-agent systems,a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning(MADRL)are su...With the rapid advancement of deep reinforcement learning(DRL)in multi-agent systems,a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning(MADRL)are surfacing.Path planning in a collision-free environment is essential for many robots to do tasks quickly and efficiently,and path planning for multiple robots using deep reinforcement learning is a new research area in the field of robotics and artificial intelligence.In this paper,we sort out the training methods for multi-robot path planning,as well as summarize the practical applications in the field of DRL-based multi-robot path planning based on the methods;finally,we suggest possible research directions for researchers.展开更多
This research is the first application of Unmanned Aerial Vehicles(UAVs)equipped with Multi-access Edge Computing(MEC)servers to offshore wind farms,providing a new task offloading solution to address the challenge of...This research is the first application of Unmanned Aerial Vehicles(UAVs)equipped with Multi-access Edge Computing(MEC)servers to offshore wind farms,providing a new task offloading solution to address the challenge of scarce edge servers in offshore wind farms.The proposed strategy is to offload the computational tasks in this scenario to other MEC servers and compute them proportionally,which effectively reduces the computational pressure on local MEC servers when wind turbine data are abnormal.Finally,the task offloading problem is modeled as a multi-intelligent deep reinforcement learning problem,and a task offloading model based on MultiAgent Deep Reinforcement Learning(MADRL)is established.The Adaptive Genetic Algorithm(AGA)is used to explore the action space of the Deep Deterministic Policy Gradient(DDPG),which effectively solves the problem of slow convergence of the DDPG algorithm in the high-dimensional action space.The simulation results show that the proposed algorithm,AGA-DDPG,saves approximately 61.8%,55%,21%,and 33%of the overall overhead compared to local MEC,random offloading,TD3,and DDPG,respectively.The proposed strategy is potentially important for improving real-time monitoring,big data analysis,and predictive maintenance of offshore wind farm operation and maintenance systems.展开更多
在车联网中,合理分配频谱资源对满足不同车辆链路业务的服务质量(Quality of Service,QoS)需求具有重要意义。为解决车辆高速移动性和全局状态信息获取困难等问题,提出了一种基于完全分布式多智能体深度强化学习(Multi-Agent Deep Reinf...在车联网中,合理分配频谱资源对满足不同车辆链路业务的服务质量(Quality of Service,QoS)需求具有重要意义。为解决车辆高速移动性和全局状态信息获取困难等问题,提出了一种基于完全分布式多智能体深度强化学习(Multi-Agent Deep Reinforcement Learning,MADRL)的资源分配算法。该算法在考虑车辆通信延迟和可靠性的情况下,通过优化频谱选择和功率分配策略来实现最大化网络吞吐量。引入共享经验池机制来解决多智能体并发学习导致的非平稳性问题。该算法基于深度Q网络(Deep Q Network,DQN),利用长短期记忆(Long Short Term Memory,LSTM)网络来捕捉和利用动态环境信息,以解决智能体的部分可观测性问题。将卷积神经网络(Convolutional Neural Network,CNN)和残差网络(Residual Network,ResNet)结合增强算法训练的准确性和预测能力。实验结果表明,所提出的算法能够满足车对基础设施(Vehicle-to-Infrastructure,V2I)链路的高吞吐量以及车对车(Vehicle-to-Vehicle,V2V)链路的低延迟要求,并且对环境变化表现出良好的适应性。展开更多
基金supported in part by the National Natural Science Foundation of China(No.61906156).
文摘This paper studies the problem of jamming decision-making for dynamic multiple communication links in wireless communication networks(WCNs).We propose a novel jamming channel allocation and power decision-making(JCAPD)approach based on multi-agent deep reinforcement learning(MADRL).In high-dynamic and multi-target aviation communication environments,the rapid changes in channels make it difficult for sensors to accurately capture instantaneous channel state information.This poses a challenge to make centralized jamming decisions with single-agent deep reinforcement learning(DRL)approaches.In response,we design a distributed multi-agent decision architecture(DMADA).We formulate multi-jammer resource allocation as a multiagent Markov decision process(MDP)and propose a fingerprint-based double deep Q-Network(FBDDQN)algorithm for solving it.Each jammer functions as an agent that interacts with the environment in this framework.Through the design of a reasonable reward and training mechanism,our approach enables jammers to achieve distributed cooperation,significantly improving the jamming success rate while considering jamming power cost,and reducing the transmission rate of links.Our experimental results show the FBDDQN algorithm is superior to the baseline methods.
基金funded by the National Key Research and Development Program of China under Grant 2019YFB1803301Beijing Natural Science Foundation (L202002)。
文摘Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies driven by Cybertwins have been proposed for adaptive task offloading strategies.However,the existence of random transmission delay between Cybertwin-driven agents and underlying networks is not considered in related works,which destroys the standard Markov property and increases the decision reaction time to reduce the task offloading strategy performance.In order to address this problem,we propose a pipelining task offloading method to lower the decision reaction time and model it as a delay-aware Markov Decision Process(MDP).Then,we design a delay-aware MADRL algorithm to minimize the weighted sum of task execution latency and energy consumption.Firstly,the state space is augmented using the lastly-received state and historical actions to rebuild the Markov property.Secondly,Gate Transformer-XL is introduced to capture historical actions'importance and maintain the consistent input dimension dynamically changed due to random transmission delays.Thirdly,a sampling method and a new loss function with the difference between the current and target state value and the difference between real state-action value and augmented state-action value are designed to obtain state transition trajectories close to the real ones.Numerical results demonstrate that the proposed methods are effective in reducing reaction time and improving the task offloading performance in the random-delay Cybertwin-enabled 6G networks.
文摘在城市轨道交通列车通信系统中,车车(Train-to-Train,T2T)通信是以列车为中心的新一代列控系统通信模式。与传统的以地面控制设备为中心的车地(Train-to-Ground,T2G)通信模式相比,T2T能降低系统的复杂度以及通信时延,提升列车运行效率。但为保障列车运行的安全性,当前的城市轨道交通列车通信系统中,车车通信与车地通信是并存的。为解决车车通信与车地通信并存场景下,通信链路资源复用引起的干扰问题,论文基于深度强化学习算法,提出了一种智能频谱共享方法。该方法以车车通信链路作为智能体,将频谱共享建模为多智能体深度强化学习(Multi-Agent Deep Reinforcement Learning,MADRL)模型。同时,由于传统深度强化学习对经验池的依赖,为提高经验池的稳定性,引入了能表征智能体行动轨迹的低维指纹信息。在该方法中,多个智能体采用分布式协作的方式与列车所处的通信环境进行交互,以此来迭代优化神经网络参数,使智能体获得的累计奖励不断提升直至收敛。最后,利用训练好的深度强化学习模型,智能体能够联合选择最佳的通信频谱和传输功率。在Python环境下的仿真结果表明:相较于传统的深度强化学习算法,论文提出的算法不但能够使系统信道容量接近最大信道容量,而且能使数据传输的成功率保持在90%以上,极大地提升了列车运行的安全性。
文摘With the rapid advancement of deep reinforcement learning(DRL)in multi-agent systems,a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning(MADRL)are surfacing.Path planning in a collision-free environment is essential for many robots to do tasks quickly and efficiently,and path planning for multiple robots using deep reinforcement learning is a new research area in the field of robotics and artificial intelligence.In this paper,we sort out the training methods for multi-robot path planning,as well as summarize the practical applications in the field of DRL-based multi-robot path planning based on the methods;finally,we suggest possible research directions for researchers.
基金supported in part by the National Natural Science Foundation of China under grant 61861007the Guizhou Province Science and Technology Planning Project ZK[2021]303+2 种基金the Guizhou Province Science Technology Support Plan under grant[2022]264,[2023]096,[2023]409 and[2023]412the Science Technology Project of POWERCHINA Guizhou Engineering Co.,Ltd.(DJ-ZDXM-2022-44)the Project of POWERCHINA Guiyang Engineering Corporation Limited(YJ2022-12).
文摘This research is the first application of Unmanned Aerial Vehicles(UAVs)equipped with Multi-access Edge Computing(MEC)servers to offshore wind farms,providing a new task offloading solution to address the challenge of scarce edge servers in offshore wind farms.The proposed strategy is to offload the computational tasks in this scenario to other MEC servers and compute them proportionally,which effectively reduces the computational pressure on local MEC servers when wind turbine data are abnormal.Finally,the task offloading problem is modeled as a multi-intelligent deep reinforcement learning problem,and a task offloading model based on MultiAgent Deep Reinforcement Learning(MADRL)is established.The Adaptive Genetic Algorithm(AGA)is used to explore the action space of the Deep Deterministic Policy Gradient(DDPG),which effectively solves the problem of slow convergence of the DDPG algorithm in the high-dimensional action space.The simulation results show that the proposed algorithm,AGA-DDPG,saves approximately 61.8%,55%,21%,and 33%of the overall overhead compared to local MEC,random offloading,TD3,and DDPG,respectively.The proposed strategy is potentially important for improving real-time monitoring,big data analysis,and predictive maintenance of offshore wind farm operation and maintenance systems.
文摘在车联网中,合理分配频谱资源对满足不同车辆链路业务的服务质量(Quality of Service,QoS)需求具有重要意义。为解决车辆高速移动性和全局状态信息获取困难等问题,提出了一种基于完全分布式多智能体深度强化学习(Multi-Agent Deep Reinforcement Learning,MADRL)的资源分配算法。该算法在考虑车辆通信延迟和可靠性的情况下,通过优化频谱选择和功率分配策略来实现最大化网络吞吐量。引入共享经验池机制来解决多智能体并发学习导致的非平稳性问题。该算法基于深度Q网络(Deep Q Network,DQN),利用长短期记忆(Long Short Term Memory,LSTM)网络来捕捉和利用动态环境信息,以解决智能体的部分可观测性问题。将卷积神经网络(Convolutional Neural Network,CNN)和残差网络(Residual Network,ResNet)结合增强算法训练的准确性和预测能力。实验结果表明,所提出的算法能够满足车对基础设施(Vehicle-to-Infrastructure,V2I)链路的高吞吐量以及车对车(Vehicle-to-Vehicle,V2V)链路的低延迟要求,并且对环境变化表现出良好的适应性。