With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for ...With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for communication and computation to build air-ground integrated networks(AGINs)offers a promising solution for seamless network coverage of remote internet of things(IoT)devices in the future.To address the performance demands of future mobile devices(MDs),we proposed an MEC-assisted AGIN system.The goal is to minimize the long-term computational overhead of MDs by jointly optimizing transmission power,flight trajecto-ries,resource allocation,and offloading ratios,while utilizing non-orthogonal multiple access(NOMA)to improve device connectivity of large-scale MDs and spectral efficiency.We first designed an adaptive clustering scheme based on K-Means to cluster MDs and established commu-nication links,improving efficiency and load balancing.Then,considering system dynamics,we introduced a partial computation offloading algorithm based on multi-agent deep deterministic pol-icy gradient(MADDPG),modeling the multi-UAV computation offloading problem as a Markov decision process(MDP).This algorithm optimizes resource allocation through centralized training and distributed execution,reducing computational overhead.Simulation results show that the pro-posed algorithm not only converges stably but also outperforms other benchmark algorithms in han-dling complex scenarios with multiple devices.展开更多
随着人工智能等技术的发展,多智能体如无人机群等的实际应用领域逐渐广泛。多智能体深度确定性策略(Multi-Agent Deep Deterministic Policy Gradient, MADDPG)算法旨在解决多智能体在协作环境中的协同配合问题,凭借其独特的Actor-Criti...随着人工智能等技术的发展,多智能体如无人机群等的实际应用领域逐渐广泛。多智能体深度确定性策略(Multi-Agent Deep Deterministic Policy Gradient, MADDPG)算法旨在解决多智能体在协作环境中的协同配合问题,凭借其独特的Actor-Critic架构已成为多智能体领域主流的应用算法之一。针对指挥决策中多智能体协同任务存在的角色分工模糊、信息过载导致的算法策略收敛较慢等问题,提出了一种引入动态角色注意力(Dynamic Role Attention, DRA)机制的改进MADDPG算法——DRA-MADDPG。该算法在Actor-Critic架构中嵌入了DRA模块,通过动态调整智能体对不同角色同伴的关注权重,来实现分工协作的精准优化。具体而言,定义了指挥任务的角色集合与阶段划分,进而构建角色协同矩阵和阶段调整系数;在Critic网络中设计DRA模块,依托角色相关性与任务阶段来计算权重并筛选关键信息;改进了Actor网络,结合角色职责生成针对性的动作。仿真实验表明,与MADDPG相比,DRA-MADDPG的训练累积回报曲线下面积(Area Under the Curve, AUC)提升了2.4%,任务完成耗时降低了19.3%,且通过训练回报曲线对比分析可知,DRA-MADDPG对于短期训练拥有更好的学习效率。证明了该方法适用于复杂指挥决策场景,为多智能体协同提供了一种相对高效的解决方案。展开更多
为提高多无人船编队系统的导航能力,提出了一种基于注意力机制的多智能体深度确定性策略梯度(ATMADDPG:Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient)算法。该算法在训练阶段,通过大量试验训练出最佳策略...为提高多无人船编队系统的导航能力,提出了一种基于注意力机制的多智能体深度确定性策略梯度(ATMADDPG:Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient)算法。该算法在训练阶段,通过大量试验训练出最佳策略,并在实验阶段直接使用训练出的最佳策略得到最佳编队路径。仿真实验将4艘相同的“百川号”无人船作为实验对象。实验结果表明,基于ATMADDPG算法的队形保持策略能实现稳定的多无人船编队导航,并在一定程度上满足队形保持的要求。相较于多智能体深度确定性策略梯度(MADDPG:Multi-Agent Depth Deterministic Policy Gradient)算法,所提出的ATMADDPG算法在收敛速度、队形保持能力和对环境变化的适应性等方面表现出更优越的性能,综合导航效率可提高约80%,具有较大的应用潜力。展开更多
针对多无人机博弈对抗过程中无人机数量动态衰减问题和传统深度强化学习算法中的稀疏奖励问题及无效经验抽取频率过高问题,本文以攻防能力及通信范围受限条件下的多无人机博弈对抗任务为研究背景,构建了红、蓝两方无人机群的博弈对抗模...针对多无人机博弈对抗过程中无人机数量动态衰减问题和传统深度强化学习算法中的稀疏奖励问题及无效经验抽取频率过高问题,本文以攻防能力及通信范围受限条件下的多无人机博弈对抗任务为研究背景,构建了红、蓝两方无人机群的博弈对抗模型,在多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)算法的Actor-Critic框架下,根据博弈环境的特点对原始的MADDPG算法进行改进。为了进一步提升算法对有效经验的探索和利用,本文构建了规则耦合模块以在无人机的决策过程中对Actor网络进行辅助。仿真实验表明,本文设计的算法在收敛速度、学习效率和稳定性方面都取了一定的提升,异构子网络的引入使算法更适用于无人机数量动态衰减的博弈场景;奖励势函数和重要性权重耦合的优先经验回放方法提升了经验差异的细化程度及优势经验利用率;规则耦合模块的引入实现了无人机决策网络对先验知识的有效利用。展开更多
基金supported by the Gansu Province Key Research and Development Plan(No.23YFGA0062)Gansu Provin-cial Innovation Fund(No.2022A-215).
文摘With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for communication and computation to build air-ground integrated networks(AGINs)offers a promising solution for seamless network coverage of remote internet of things(IoT)devices in the future.To address the performance demands of future mobile devices(MDs),we proposed an MEC-assisted AGIN system.The goal is to minimize the long-term computational overhead of MDs by jointly optimizing transmission power,flight trajecto-ries,resource allocation,and offloading ratios,while utilizing non-orthogonal multiple access(NOMA)to improve device connectivity of large-scale MDs and spectral efficiency.We first designed an adaptive clustering scheme based on K-Means to cluster MDs and established commu-nication links,improving efficiency and load balancing.Then,considering system dynamics,we introduced a partial computation offloading algorithm based on multi-agent deep deterministic pol-icy gradient(MADDPG),modeling the multi-UAV computation offloading problem as a Markov decision process(MDP).This algorithm optimizes resource allocation through centralized training and distributed execution,reducing computational overhead.Simulation results show that the pro-posed algorithm not only converges stably but also outperforms other benchmark algorithms in han-dling complex scenarios with multiple devices.
文摘随着人工智能等技术的发展,多智能体如无人机群等的实际应用领域逐渐广泛。多智能体深度确定性策略(Multi-Agent Deep Deterministic Policy Gradient, MADDPG)算法旨在解决多智能体在协作环境中的协同配合问题,凭借其独特的Actor-Critic架构已成为多智能体领域主流的应用算法之一。针对指挥决策中多智能体协同任务存在的角色分工模糊、信息过载导致的算法策略收敛较慢等问题,提出了一种引入动态角色注意力(Dynamic Role Attention, DRA)机制的改进MADDPG算法——DRA-MADDPG。该算法在Actor-Critic架构中嵌入了DRA模块,通过动态调整智能体对不同角色同伴的关注权重,来实现分工协作的精准优化。具体而言,定义了指挥任务的角色集合与阶段划分,进而构建角色协同矩阵和阶段调整系数;在Critic网络中设计DRA模块,依托角色相关性与任务阶段来计算权重并筛选关键信息;改进了Actor网络,结合角色职责生成针对性的动作。仿真实验表明,与MADDPG相比,DRA-MADDPG的训练累积回报曲线下面积(Area Under the Curve, AUC)提升了2.4%,任务完成耗时降低了19.3%,且通过训练回报曲线对比分析可知,DRA-MADDPG对于短期训练拥有更好的学习效率。证明了该方法适用于复杂指挥决策场景,为多智能体协同提供了一种相对高效的解决方案。
文摘针对多无人机博弈对抗过程中无人机数量动态衰减问题和传统深度强化学习算法中的稀疏奖励问题及无效经验抽取频率过高问题,本文以攻防能力及通信范围受限条件下的多无人机博弈对抗任务为研究背景,构建了红、蓝两方无人机群的博弈对抗模型,在多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)算法的Actor-Critic框架下,根据博弈环境的特点对原始的MADDPG算法进行改进。为了进一步提升算法对有效经验的探索和利用,本文构建了规则耦合模块以在无人机的决策过程中对Actor网络进行辅助。仿真实验表明,本文设计的算法在收敛速度、学习效率和稳定性方面都取了一定的提升,异构子网络的引入使算法更适用于无人机数量动态衰减的博弈场景;奖励势函数和重要性权重耦合的优先经验回放方法提升了经验差异的细化程度及优势经验利用率;规则耦合模块的引入实现了无人机决策网络对先验知识的有效利用。