期刊文献+
共找到44篇文章
< 1 2 3 >
每页显示 20 50 100
MARCS:A Mobile Crowdsensing Framework Based on Data Shapley Value Enabled Multi-Agent Deep Reinforcement Learning
1
作者 Yiqin Wang Yufeng Wang +1 位作者 Jianhua Ma Qun Jin 《Computers, Materials & Continua》 2025年第3期4431-4449,共19页
Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.Howeve... Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.However,in practice,opportunistic MCS has several challenges from both the perspectives of MCS participants and the data platform.On the one hand,participants face uncertainties in conducting MCS tasks,including their mobility and implicit interactions among participants,and participants’economic returns given by the MCS data platform are determined by not only their own actions but also other participants’strategic actions.On the other hand,the platform can only observe the participants’uploaded sensing data that depends on the unknown effort/action exerted by participants to the platform,while,for optimizing its overall objective,the platform needs to properly reward certain participants for incentivizing them to provide high-quality data.To address the challenge of balancing individual incentives and platform objectives in MCS,this paper proposes MARCS,an online sensing policy based on multi-agent deep reinforcement learning(MADRL)with centralized training and decentralized execution(CTDE).Specifically,the interactions between MCS participants and the data platform are modeled as a partially observable Markov game,where participants,acting as agents,use DRL-based policies to make decisions based on local observations,such as task trajectories and platform payments.To align individual and platform goals effectively,the platform leverages Shapley value to estimate the contribution of each participant’s sensed data,using these estimates as immediate rewards to guide agent training.The experimental results on real mobility trajectory datasets indicate that the revenue of MARCS reaches almost 35%,53%,and 100%higher than DDPG,Actor-Critic,and model predictive control(MPC)respectively on the participant side and similar results on the platform side,which show superior performance compared to baselines. 展开更多
关键词 Mobile crowdsensing online data acquisition data Shapley value multi-agent deep reinforcement learning centralized training and decentralized execution(CTDE)
在线阅读 下载PDF
A pipelining task offloading strategy via delay-aware multi-agent reinforcement learning in Cybertwin-enabled 6G network
2
作者 Haiwen Niu Luhan Wang +3 位作者 Keliang Du Zhaoming Lu Xiangming Wen Yu Liu 《Digital Communications and Networks》 2025年第1期92-105,共14页
Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies dri... Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies driven by Cybertwins have been proposed for adaptive task offloading strategies.However,the existence of random transmission delay between Cybertwin-driven agents and underlying networks is not considered in related works,which destroys the standard Markov property and increases the decision reaction time to reduce the task offloading strategy performance.In order to address this problem,we propose a pipelining task offloading method to lower the decision reaction time and model it as a delay-aware Markov Decision Process(MDP).Then,we design a delay-aware MADRL algorithm to minimize the weighted sum of task execution latency and energy consumption.Firstly,the state space is augmented using the lastly-received state and historical actions to rebuild the Markov property.Secondly,Gate Transformer-XL is introduced to capture historical actions'importance and maintain the consistent input dimension dynamically changed due to random transmission delays.Thirdly,a sampling method and a new loss function with the difference between the current and target state value and the difference between real state-action value and augmented state-action value are designed to obtain state transition trajectories close to the real ones.Numerical results demonstrate that the proposed methods are effective in reducing reaction time and improving the task offloading performance in the random-delay Cybertwin-enabled 6G networks. 展开更多
关键词 Cybertwin multi-agent deep reinforcement learning(MADRL) Task offloading PIPELINING Delay-aware
在线阅读 下载PDF
UAV-Assisted Dynamic Avatar Task Migration for Vehicular Metaverse Services: A Multi-Agent Deep Reinforcement Learning Approach 被引量:1
3
作者 Jiawen Kang Junlong Chen +6 位作者 Minrui Xu Zehui Xiong Yutao Jiao Luchao Han Dusit Niyato Yongju Tong Shengli Xie 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期430-445,共16页
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers... Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses. 展开更多
关键词 AVATAR blockchain metaverses multi-agent deep reinforcement learning transformer UAVS
在线阅读 下载PDF
Deep Reinforcement Learning-based Multi-Objective Scheduling for Distributed Heterogeneous Hybrid Flow Shops with Blocking Constraints
4
作者 Xueyan Sun Weiming Shen +3 位作者 Jiaxin Fan Birgit Vogel-Heuser Fandi Bi Chunjiang Zhang 《Engineering》 2025年第3期278-291,共14页
This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved pr... This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality. 展开更多
关键词 Multi-objective Markov decision process multi-agent deep reinforcement learning Proximal policy optimization Distributed hybrid flow-shop scheduling Blocking constraints
在线阅读 下载PDF
Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning
5
作者 DONG Yubo CUI Tao +3 位作者 ZHOU Yufan SONG Xun ZHU Yue DONG Peng 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第4期646-655,共10页
Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting... Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance. 展开更多
关键词 multi-agent reinforcement learning deep reinforcement learning(DRL) long episode reward function
原文传递
Tactical reward shaping for large-scale combat by multi-agent reinforcement learning
6
作者 DUO Nanxun WANG Qinzhao +1 位作者 LYU Qiang WANG Wei 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1516-1529,共14页
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb... Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy. 展开更多
关键词 deep reinforcement learning multi-agent reinforce-ment learning multi-agent combat unmanned battle reward shaping
在线阅读 下载PDF
Automatic depth matching method of well log based on deep reinforcement learning 被引量:3
7
作者 XIONG Wenjun XIAO Lizhi +1 位作者 YUAN Jiangru YUE Wenzheng 《Petroleum Exploration and Development》 SCIE 2024年第3期634-646,共13页
In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep rei... In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep reinforcement learning(MARL)method to automate the depth matching of multi-well logs.This method defines multiple top-down dual sliding windows based on the convolutional neural network(CNN)to extract and capture similar feature sequences on well logs,and it establishes an interaction mechanism between agents and the environment to control the depth matching process.Specifically,the agent selects an action to translate or scale the feature sequence based on the double deep Q-network(DDQN).Through the feedback of the reward signal,it evaluates the effectiveness of each action,aiming to obtain the optimal strategy and improve the accuracy of the matching task.Our experiments show that MARL can automatically perform depth matches for well-logs in multiple wells,and reduce manual intervention.In the application to the oil field,a comparative analysis of dynamic time warping(DTW),deep Q-learning network(DQN),and DDQN methods revealed that the DDQN algorithm,with its dual-network evaluation mechanism,significantly improves performance by identifying and aligning more details in the well log feature sequences,thus achieving higher depth matching accuracy. 展开更多
关键词 artificial intelligence machine learning depth matching well log multi-agent deep reinforcement learning convolutional neural network double deep Q-network
在线阅读 下载PDF
Obstacle Avoidance in Multi-Agent Formation Process Based on Deep Reinforcement Learning 被引量:1
8
作者 JI Xiukun HAI Jintao +4 位作者 LUO Wenguang LIN Cuixia XIONG Yu OU Zengkai WEN Jiayan 《Journal of Shanghai Jiaotong university(Science)》 EI 2021年第5期680-685,共6页
To solve the problems of difficult control law design,poor portability,and poor stability of traditional multi-agent formation obstacle avoidance algorithms,a multi-agent formation obstacle avoidance method based on d... To solve the problems of difficult control law design,poor portability,and poor stability of traditional multi-agent formation obstacle avoidance algorithms,a multi-agent formation obstacle avoidance method based on deep reinforcement learning(DRL)is proposed.This method combines the perception ability of convolutional neural networks(CNNs)with the decision-making ability of reinforcement learning in a general form and realizes direct output control from the visual perception input of the environment to the action through an end-to-end learning method.The multi-agent system(MAS)model of the follow-leader formation method was designed with the wheelbarrow as the control object.An improved deep Q netwrok(DQN)algorithm(we improved its discount factor and learning efficiency and designed a reward value function that considers the distance relationship between the agent and the obstacle and the coordination factor between the multi-agents)was designed to achieve obstacle avoidance and collision avoidance in the process of multi-agent formation into the desired formation.The simulation results show that the proposed method achieves the expected goal of multi-agent formation obstacle avoidance and has stronger portability compared with the traditional algorithm. 展开更多
关键词 wheelbarrow multi-agent deep reinforcement learning(DRL) FORMATION obstacle avoidance
原文传递
Multi-Agent Deep Reinforcement Learning for Efficient Computation Offloading in Mobile Edge Computing
9
作者 Tianzhe Jiao Xiaoyue Feng +2 位作者 Chaopeng Guo Dongqi Wang Jie Song 《Computers, Materials & Continua》 SCIE EI 2023年第9期3585-3603,共19页
Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtua... Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtual reality,mobile devices,and smart cities.In general,these IoT applications always bring higher energy consumption than traditional applications,which are usually energy-constrained.To provide persistent energy,many references have studied the offloading problem to save energy consumption.However,the dynamic environment dramatically increases the optimization difficulty of the offloading decision.In this paper,we aim to minimize the energy consumption of the entireMECsystemunder the latency constraint by fully considering the dynamic environment.UnderMarkov games,we propose amulti-agent deep reinforcement learning approach based on the bi-level actorcritic learning structure to jointly optimize the offloading decision and resource allocation,which can solve the combinatorial optimization problem using an asymmetric method and compute the Stackelberg equilibrium as a better convergence point than Nash equilibrium in terms of Pareto superiority.Our method can better adapt to a dynamic environment during the data transmission than the single-agent strategy and can effectively tackle the coordination problem in the multi-agent environment.The simulation results show that the proposed method could decrease the total computational overhead by 17.8%compared to the actor-critic-based method and reduce the total computational overhead by 31.3%,36.5%,and 44.7%compared with randomoffloading,all local execution,and all offloading execution,respectively. 展开更多
关键词 Computation offloading multi-agent deep reinforcement learning mobile-edge computing latency energy efficiency
在线阅读 下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
10
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
在线阅读 下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning 被引量:2
11
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles multi-agent deep reinforcement learning Cooperative hunting Feature embedding Proximal policy optimization
在线阅读 下载PDF
Locally generalised multi-agent reinforcement learning for demand and capacity balancing with customised neural networks 被引量:2
12
作者 Yutong CHEN Minghua HU +1 位作者 Yan XU Lei YANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第4期338-353,共16页
Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning... Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning(MARL)for real-world DCB problems is proposed.The proposed method can deploy trained agents directly to unseen scenarios in a specific Air Traffic Flow Management(ATFM)region to quickly obtain a satisfactory solution.In this method,agents of all flights in a scenario form a multi-agent decision-making system based on partial observation.The trained agent with the customised neural network can be deployed directly on the corresponding flight,allowing it to solve the DCB problem jointly.A cooperation coefficient is introduced in the reward function,which is used to adjust the agent’s cooperation preference in a multi-agent system,thereby controlling the distribution of flight delay time allocation.A multi-iteration mechanism is designed for the DCB decision-making framework to deal with problems arising from non-stationarity in MARL and to ensure that all hotspots are eliminated.Experiments based on large-scale high-complexity real-world scenarios are conducted to verify the effectiveness and efficiency of the method.From a statis-tical point of view,it is proven that the proposed method is generalised within the scope of the flights and sectors of interest,and its optimisation performance outperforms the standard computer-assisted slot allocation and state-of-the-art RL-based DCB methods.The sensitivity analysis preliminarily reveals the effect of the cooperation coefficient on delay time allocation. 展开更多
关键词 Air traffic flow management Demand and capacity bal-ancing deep Q-learning network Flight delays GENERALISATION Ground delay program multi-agent reinforcement learning
原文传递
Multi-objective optimization of hybrid electric vehicles energy management using multi-agent deep reinforcement learning framework
13
作者 Xiaoyu Li Zaihang Zhou +2 位作者 Changyin Wei Xiao Gao Yibo Zhang 《Energy and AI》 2025年第2期287-297,共11页
Hybrid electric vehicles(HEVs)have the advantages of lower emissions and less noise pollution than traditional fuel vehicles.Developing reasonable energy management strategies(EMSs)can effectively reduce fuel consumpt... Hybrid electric vehicles(HEVs)have the advantages of lower emissions and less noise pollution than traditional fuel vehicles.Developing reasonable energy management strategies(EMSs)can effectively reduce fuel consumption and improve the fuel economy of HEVs.However,current EMSs still have problems,such as complex multi-objective optimization and poor algorithm robustness.Herein,a multi-agent reinforcement learning(MADRL)framework is proposed based on Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm to solve such problems.Specifically,a vehicle model and dynamics model are established,and based on this,a multi-objective EMS is developed by considering fuel economy,maintaining the battery State of Charge(SOC),and reducing battery degradation.Secondly,the proposed strategy regards the engine and battery as two agents,and the agents cooperate with each other to realize optimal power distribution and achieve the optimal control strategy.Finally,the WLTC and HWFET driving cycles are employed to verify the performances of the proposed method,the fuel consumption decreases by 26.91%and 8.41%on average compared to the other strategies.The simulation results demonstrate that the proposed strategy has remarkable superiority in multi-objective optimization. 展开更多
关键词 Energy management strategy Hybrid electric vehicle reinforcement learning multi-agent deep deterministicstrategy gradient
在线阅读 下载PDF
Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork 被引量:17
14
作者 Chang WANG Lizhen WU +3 位作者 Chao YAN Zhichao WANG Han LONG Chao YU 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2020年第11期2930-2945,共16页
Unmanned Aerial Vehicles(UAVs)are useful in dangerous and dynamic tasks such as search-and-rescue,forest surveillance,and anti-terrorist operations.These tasks can be solved better through the collaboration of multipl... Unmanned Aerial Vehicles(UAVs)are useful in dangerous and dynamic tasks such as search-and-rescue,forest surveillance,and anti-terrorist operations.These tasks can be solved better through the collaboration of multiple UAVs under human supervision.However,it is still difficult for human to monitor,understand,predict and control the behaviors of the UAVs due to the task complexity as well as the black-box machine learning and planning algorithms being used.In this paper,the coactive design method is adopted to analyze the cognitive capabilities required for the tasks and design the interdependencies among the heterogeneous teammates of UAVs or human for coherent collaboration.Then,an agent-based task planner is proposed to automatically decompose a complex task into a sequence of explainable subtasks under constrains of resources,execution time,social rules and costs.Besides,a deep reinforcement learning approach is designed for the UAVs to learn optimal policies of a flocking behavior and a path planner that are easy for the human operator to understand and control.Finally,a mixed-initiative action selection mechanism is used to evaluate the learned policies as well as the human’s decisions.Experimental results demonstrate the effectiveness of the proposed methods. 展开更多
关键词 Coactive design deep reinforcement learning Human-robot teamwork Mixed-initiative multi-agent system Task planning UAV
原文传递
Deep reinforcement learning based multi-level dynamic reconfiguration for urban distribution network:a cloud-edge collaboration architecture 被引量:1
15
作者 Siyuan Jiang Hongjun Gao +2 位作者 Xiaohui Wang Junyong Liu Kunyu Zuo 《Global Energy Interconnection》 EI CAS CSCD 2023年第1期1-14,共14页
With the construction of the power Internet of Things(IoT),communication between smart devices in urban distribution networks has been gradually moving towards high speed,high compatibility,and low latency,which provi... With the construction of the power Internet of Things(IoT),communication between smart devices in urban distribution networks has been gradually moving towards high speed,high compatibility,and low latency,which provides reliable support for reconfiguration optimization in urban distribution networks.Thus,this study proposed a deep reinforcement learning based multi-level dynamic reconfiguration method for urban distribution networks in a cloud-edge collaboration architecture to obtain a real-time optimal multi-level dynamic reconfiguration solution.First,the multi-level dynamic reconfiguration method was discussed,which included feeder-,transformer-,and substation-levels.Subsequently,the multi-agent system was combined with the cloud-edge collaboration architecture to build a deep reinforcement learning model for multi-level dynamic reconfiguration in an urban distribution network.The cloud-edge collaboration architecture can effectively support the multi-agent system to conduct“centralized training and decentralized execution”operation modes and improve the learning efficiency of the model.Thereafter,for a multi-agent system,this study adopted a combination of offline and online learning to endow the model with the ability to realize automatic optimization and updation of the strategy.In the offline learning phase,a Q-learning-based multi-agent conservative Q-learning(MACQL)algorithm was proposed to stabilize the learning results and reduce the risk of the next online learning phase.In the online learning phase,a multi-agent deep deterministic policy gradient(MADDPG)algorithm based on policy gradients was proposed to explore the action space and update the experience pool.Finally,the effectiveness of the proposed method was verified through a simulation analysis of a real-world 445-node system. 展开更多
关键词 Cloud-edge collaboration architecture multi-agent deep reinforcement learning Multi-level dynamic reconfiguration Offline learning Online learning
在线阅读 下载PDF
Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning 被引量:1
16
作者 Tianyun Qiu Yaxuan Cheng 《Journal of Electronic Research and Application》 2021年第6期25-29,共5页
With the rapid advancement of deep reinforcement learning(DRL)in multi-agent systems,a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning(MADRL)are su... With the rapid advancement of deep reinforcement learning(DRL)in multi-agent systems,a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning(MADRL)are surfacing.Path planning in a collision-free environment is essential for many robots to do tasks quickly and efficiently,and path planning for multiple robots using deep reinforcement learning is a new research area in the field of robotics and artificial intelligence.In this paper,we sort out the training methods for multi-robot path planning,as well as summarize the practical applications in the field of DRL-based multi-robot path planning based on the methods;finally,we suggest possible research directions for researchers. 展开更多
关键词 MADRL deep reinforcement learning multi-agent system MULTI-ROBOT Path planning
在线阅读 下载PDF
Deep Reinforcement Learning for Addressing Disruptions in Traffic Light Control
17
作者 Faizan Rasheed Kok-Lim Alvin Yau +1 位作者 Rafidah Md Noor Yung-Wey Chong 《Computers, Materials & Continua》 SCIE EI 2022年第5期2225-2247,共23页
This paper investigates the use of multi-agent deep Q-network(MADQN)to address the curse of dimensionality issue occurred in the traditional multi-agent reinforcement learning(MARL)approach.The proposed MADQN is appli... This paper investigates the use of multi-agent deep Q-network(MADQN)to address the curse of dimensionality issue occurred in the traditional multi-agent reinforcement learning(MARL)approach.The proposed MADQN is applied to traffic light controllers at multiple intersections with busy traffic and traffic disruptions,particularly rainfall.MADQN is based on deep Q-network(DQN),which is an integration of the traditional reinforcement learning(RL)and the newly emerging deep learning(DL)approaches.MADQN enables traffic light controllers to learn,exchange knowledge with neighboring agents,and select optimal joint actions in a collaborative manner.A case study based on a real traffic network is conducted as part of a sustainable urban city project in the Sunway City of Kuala Lumpur in Malaysia.Investigation is also performed using a grid traffic network(GTN)to understand that the proposed scheme is effective in a traditional traffic network.Our proposed scheme is evaluated using two simulation tools,namely Matlab and Simulation of Urban Mobility(SUMO).Our proposed scheme has shown that the cumulative delay of vehicles can be reduced by up to 30%in the simulations. 展开更多
关键词 Artificial intelligence traffic light control traffic disruptions multi-agent deep Q-network deep reinforcement learning
在线阅读 下载PDF
Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments 被引量:1
18
作者 LI Shuyi LI Minzhe JING Zhongliang 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第4期601-612,共12页
The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factor... The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factors contribute to a tendency for the solution to converge slowly,and in some cases,diverge altogether.In addressing this issue,this paper introduces a novel approach utilizing a double dueling deep Q-network(D3QN),tailored for dynamic multi-agent environments.A novel reward function based on multi-agent positional constraints is designed,and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents.Moreover,the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum.To match radar and image sensors,a convolutional neural network-long short-term memory(CNN-LSTM)architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN.The algorithm’s efficacy and reliability are validated in a simulated environment,utilizing robot operating system and Gazebo.The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios.In terms of the average success rate and accuracy,the proposed method is superior to other deep learning algorithms,and the convergence speed is also improved. 展开更多
关键词 multi-agent path planning deep reinforcement learning deep Q-network
原文传递
Optimization control method for dedicated outdoor air system in multi-zone office buildings based on deep reinforcement learning
19
作者 Xudong Tang Ling Zhang Yongqiang Luo 《Building Simulation》 2025年第4期881-896,共16页
Heating,ventilation,and air conditioning(HVAC)systems consume a significant amount of energy to maintain thermal comfort and indoor air quality in buildings,which results in high operational costs.Reinforcement learni... Heating,ventilation,and air conditioning(HVAC)systems consume a significant amount of energy to maintain thermal comfort and indoor air quality in buildings,which results in high operational costs.Reinforcement learning is an effective method for controlling HVAC systems.However,in large and complex HVAC systems,traditional reinforcement learning algorithms often face the challenges of slow training speed and poor convergence performance.This paper proposes a multi-objective optimization control method based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm,which aims to minimize HVAC energy consumption while ensuring optimal thermal comfort and indoor air quality in each zone.Using a multi-zone office building with fan coil units and a dedicated outdoor air system as a case study,we developed an EnergyPlus-Python co-simulation platform.The proposed control method was employed during both the heating and cooling seasons to independently control the temperature setpoints and fresh airflow in different zones of the office building.The simulation results from both the heating and cooling seasons demonstrate that the MADDPG control method exhibits faster convergence during training and excellent learning capabilities,allowing it to adapt effectively to changes in environmental conditions and implement appropriate control actions.Under similar indoor thermal comfort and air quality conditions,the MADDPG control method consumes less energy than the traditional reinforcement learning method,it saves 24.1%of energy during the heating season and 8.9%during the cooling season compared to the rule-based control method.Additionally,by adjusting the reward function in the MADDPG algorithm,it is possible to flexibly balance energy consumption,thermal comfort,and air quality preferences,demonstrating the algorithm’s strong applicability. 展开更多
关键词 multi-zone HVAC systems energy consumption thermal comfort indoor air quality multi-agent deep reinforcement learning
原文传递
Learning-based user association and dynamic resource allocation in multi-connectivity enabled unmanned aerial vehicle networks
20
作者 Zhipeng Cheng Minghui Liwang +3 位作者 Ning Chen Lianfen Huang Nadra Guizani Xiaojiang Du 《Digital Communications and Networks》 SCIE CSCD 2024年第1期53-62,共10页
Unmanned Aerial Vehicles(UAvs)as aerial base stations to provide communication services for ground users is a flexible and cost-effective paradigm in B5G.Besides,dynamic resource allocation and multi-connectivity can ... Unmanned Aerial Vehicles(UAvs)as aerial base stations to provide communication services for ground users is a flexible and cost-effective paradigm in B5G.Besides,dynamic resource allocation and multi-connectivity can be adopted to further harness the potentials of UAVs in improving communication capacity,in such situations such that the interference among users becomes a pivotal disincentive requiring effective solutions.To this end,we investigate the Joint UAV-User Association,Channel Allocation,and transmission Power Control(J-UACAPC)problem in a multi-connectivity-enabled UAV network with constrained backhaul links,where each UAV can determine the reusable channels and transmission power to serve the selected ground users.The goal was to mitigate co-channel interference while maximizing long-term system utility.The problem was modeled as a cooperative stochastic game with hybrid discrete-continuous action space.A Multi-Agent Hybrid Deep Reinforcement Learning(MAHDRL)algorithm was proposed to address this problem.Extensive simulation results demonstrated the effectiveness of the proposed algorithm and showed that it has a higher system utility than the baseline methods. 展开更多
关键词 UAV-user association Multi-connectivity Resource allocation Power control multi-agent deep reinforcement learning
在线阅读 下载PDF
上一页 1 2 3 下一页 到第
使用帮助 返回顶部