随着城市化进程的加速,城市交通堵塞问题日益显著,尤其是在人口密集的城市中心区域,如何实现行人路径的有效规划,是一个亟待解决的问题。将强化学习算法应用于多智能体协同路径规划中,可以解决传统智能体路径规划方法在复杂环境场景下...随着城市化进程的加速,城市交通堵塞问题日益显著,尤其是在人口密集的城市中心区域,如何实现行人路径的有效规划,是一个亟待解决的问题。将强化学习算法应用于多智能体协同路径规划中,可以解决传统智能体路径规划方法在复杂环境场景下应用的局限性,本文提出了一种基于改进奖励机制下的多智能体确定性策略梯度算法(Multi-Agent Deep Deterministic Policy Gradient with Reward Enhancement,MADDPG-R),在多智能体深度确定性策略梯度算法的基础上,设计一个新的奖励机制,能够有效应对多智能体环境中的复杂情况,保障系统运行的实时性。同时,本文还设计了一个动态的仿真场景,并在二维环境中进行了仿真实验,验证了该算法的有效性。展开更多
现有基于多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)的方法在电子对抗中展现出连续动作空间处理能力和异构智能体协同优势,为无人机协同干扰、动态频谱感知等任务提供了有效解决方案。然而,...现有基于多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)的方法在电子对抗中展现出连续动作空间处理能力和异构智能体协同优势,为无人机协同干扰、动态频谱感知等任务提供了有效解决方案。然而,该算法采用的单环境串行采样机制导致严重效率瓶颈(样本采集速率受限造成训练周期漫长,CPU多核计算资源利用率不足,策略更新时序依赖引发的奖励方差增加),难以满足电子对抗场景的实时决策需求。为此,提出了并行环境采样MADDPG(parallel-sampling MADDPG,PMADDPG)算法。该方法通过构建多实例环境容器实现并行交互采样,设计线程安全经验回放机制确保多线程数据一致性,并采用异步梯度更新策略解耦训练流程。在OpenAI多智能体粒子环境中的实验表明,当并行环境数为8时,PMADDPG的样本采集速率相比原版提升88%,CPU利用率稳定提高41%以上,奖励值方差降低34.4%。该方法显著提升了电子对抗动态决策任务的训练效率,为频谱对抗、无人机突防等实战场景提供了高效可靠的智能决策支持。展开更多
With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for ...With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for communication and computation to build air-ground integrated networks(AGINs)offers a promising solution for seamless network coverage of remote internet of things(IoT)devices in the future.To address the performance demands of future mobile devices(MDs),we proposed an MEC-assisted AGIN system.The goal is to minimize the long-term computational overhead of MDs by jointly optimizing transmission power,flight trajecto-ries,resource allocation,and offloading ratios,while utilizing non-orthogonal multiple access(NOMA)to improve device connectivity of large-scale MDs and spectral efficiency.We first designed an adaptive clustering scheme based on K-Means to cluster MDs and established commu-nication links,improving efficiency and load balancing.Then,considering system dynamics,we introduced a partial computation offloading algorithm based on multi-agent deep deterministic pol-icy gradient(MADDPG),modeling the multi-UAV computation offloading problem as a Markov decision process(MDP).This algorithm optimizes resource allocation through centralized training and distributed execution,reducing computational overhead.Simulation results show that the pro-posed algorithm not only converges stably but also outperforms other benchmark algorithms in han-dling complex scenarios with multiple devices.展开更多
随着人工智能等技术的发展,多智能体如无人机群等的实际应用领域逐渐广泛。多智能体深度确定性策略(Multi-Agent Deep Deterministic Policy Gradient, MADDPG)算法旨在解决多智能体在协作环境中的协同配合问题,凭借其独特的Actor-Criti...随着人工智能等技术的发展,多智能体如无人机群等的实际应用领域逐渐广泛。多智能体深度确定性策略(Multi-Agent Deep Deterministic Policy Gradient, MADDPG)算法旨在解决多智能体在协作环境中的协同配合问题,凭借其独特的Actor-Critic架构已成为多智能体领域主流的应用算法之一。针对指挥决策中多智能体协同任务存在的角色分工模糊、信息过载导致的算法策略收敛较慢等问题,提出了一种引入动态角色注意力(Dynamic Role Attention, DRA)机制的改进MADDPG算法——DRA-MADDPG。该算法在Actor-Critic架构中嵌入了DRA模块,通过动态调整智能体对不同角色同伴的关注权重,来实现分工协作的精准优化。具体而言,定义了指挥任务的角色集合与阶段划分,进而构建角色协同矩阵和阶段调整系数;在Critic网络中设计DRA模块,依托角色相关性与任务阶段来计算权重并筛选关键信息;改进了Actor网络,结合角色职责生成针对性的动作。仿真实验表明,与MADDPG相比,DRA-MADDPG的训练累积回报曲线下面积(Area Under the Curve, AUC)提升了2.4%,任务完成耗时降低了19.3%,且通过训练回报曲线对比分析可知,DRA-MADDPG对于短期训练拥有更好的学习效率。证明了该方法适用于复杂指挥决策场景,为多智能体协同提供了一种相对高效的解决方案。展开更多
为提高多无人船编队系统的导航能力,提出了一种基于注意力机制的多智能体深度确定性策略梯度(ATMADDPG:Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient)算法。该算法在训练阶段,通过大量试验训练出最佳策略...为提高多无人船编队系统的导航能力,提出了一种基于注意力机制的多智能体深度确定性策略梯度(ATMADDPG:Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient)算法。该算法在训练阶段,通过大量试验训练出最佳策略,并在实验阶段直接使用训练出的最佳策略得到最佳编队路径。仿真实验将4艘相同的“百川号”无人船作为实验对象。实验结果表明,基于ATMADDPG算法的队形保持策略能实现稳定的多无人船编队导航,并在一定程度上满足队形保持的要求。相较于多智能体深度确定性策略梯度(MADDPG:Multi-Agent Depth Deterministic Policy Gradient)算法,所提出的ATMADDPG算法在收敛速度、队形保持能力和对环境变化的适应性等方面表现出更优越的性能,综合导航效率可提高约80%,具有较大的应用潜力。展开更多
文摘随着城市化进程的加速,城市交通堵塞问题日益显著,尤其是在人口密集的城市中心区域,如何实现行人路径的有效规划,是一个亟待解决的问题。将强化学习算法应用于多智能体协同路径规划中,可以解决传统智能体路径规划方法在复杂环境场景下应用的局限性,本文提出了一种基于改进奖励机制下的多智能体确定性策略梯度算法(Multi-Agent Deep Deterministic Policy Gradient with Reward Enhancement,MADDPG-R),在多智能体深度确定性策略梯度算法的基础上,设计一个新的奖励机制,能够有效应对多智能体环境中的复杂情况,保障系统运行的实时性。同时,本文还设计了一个动态的仿真场景,并在二维环境中进行了仿真实验,验证了该算法的有效性。
基金supported by the Gansu Province Key Research and Development Plan(No.23YFGA0062)Gansu Provin-cial Innovation Fund(No.2022A-215).
文摘With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for communication and computation to build air-ground integrated networks(AGINs)offers a promising solution for seamless network coverage of remote internet of things(IoT)devices in the future.To address the performance demands of future mobile devices(MDs),we proposed an MEC-assisted AGIN system.The goal is to minimize the long-term computational overhead of MDs by jointly optimizing transmission power,flight trajecto-ries,resource allocation,and offloading ratios,while utilizing non-orthogonal multiple access(NOMA)to improve device connectivity of large-scale MDs and spectral efficiency.We first designed an adaptive clustering scheme based on K-Means to cluster MDs and established commu-nication links,improving efficiency and load balancing.Then,considering system dynamics,we introduced a partial computation offloading algorithm based on multi-agent deep deterministic pol-icy gradient(MADDPG),modeling the multi-UAV computation offloading problem as a Markov decision process(MDP).This algorithm optimizes resource allocation through centralized training and distributed execution,reducing computational overhead.Simulation results show that the pro-posed algorithm not only converges stably but also outperforms other benchmark algorithms in han-dling complex scenarios with multiple devices.
文摘随着人工智能等技术的发展,多智能体如无人机群等的实际应用领域逐渐广泛。多智能体深度确定性策略(Multi-Agent Deep Deterministic Policy Gradient, MADDPG)算法旨在解决多智能体在协作环境中的协同配合问题,凭借其独特的Actor-Critic架构已成为多智能体领域主流的应用算法之一。针对指挥决策中多智能体协同任务存在的角色分工模糊、信息过载导致的算法策略收敛较慢等问题,提出了一种引入动态角色注意力(Dynamic Role Attention, DRA)机制的改进MADDPG算法——DRA-MADDPG。该算法在Actor-Critic架构中嵌入了DRA模块,通过动态调整智能体对不同角色同伴的关注权重,来实现分工协作的精准优化。具体而言,定义了指挥任务的角色集合与阶段划分,进而构建角色协同矩阵和阶段调整系数;在Critic网络中设计DRA模块,依托角色相关性与任务阶段来计算权重并筛选关键信息;改进了Actor网络,结合角色职责生成针对性的动作。仿真实验表明,与MADDPG相比,DRA-MADDPG的训练累积回报曲线下面积(Area Under the Curve, AUC)提升了2.4%,任务完成耗时降低了19.3%,且通过训练回报曲线对比分析可知,DRA-MADDPG对于短期训练拥有更好的学习效率。证明了该方法适用于复杂指挥决策场景,为多智能体协同提供了一种相对高效的解决方案。