Wi-Fi technology has evolved significantly since its introduction in 1997,advancing to Wi-Fi 6 as the latest standard,with Wi-Fi 7 currently under development.Despite these advancements,integrating machine learning in...Wi-Fi technology has evolved significantly since its introduction in 1997,advancing to Wi-Fi 6 as the latest standard,with Wi-Fi 7 currently under development.Despite these advancements,integrating machine learning into Wi-Fi networks remains challenging,especially in decentralized environments with multiple access points(mAPs).This paper is a short review that summarizes the potential applications of federated reinforcement learning(FRL)across eight key areas of Wi-Fi functionality,including channel access,link adaptation,beamforming,multi-user transmissions,channel bonding,multi-link operation,spatial reuse,and multi-basic servic set(multi-BSS)coordination.FRL is highlighted as a promising framework for enabling decentralized training and decision-making while preserving data privacy.To illustrate its role in practice,we present a case study on link activation in a multi-link operation(MLO)environment with multiple APs.Through theoretical discussion and simulation results,the study demonstrates how FRL can improve performance and reliability,paving the way for more adaptive and collaborative Wi-Fi networks in the era of Wi-Fi 7 and beyond.展开更多
Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making p...Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target.展开更多
Adversarial Reinforcement Learning(ARL)models for intelligent devices and Network Intrusion Detection Systems(NIDS)improve systemresilience against sophisticated cyber-attacks.As a core component of ARL,Adversarial Tr...Adversarial Reinforcement Learning(ARL)models for intelligent devices and Network Intrusion Detection Systems(NIDS)improve systemresilience against sophisticated cyber-attacks.As a core component of ARL,Adversarial Training(AT)enables NIDS agents to discover and prevent newattack paths by exposing them to competing examples,thereby increasing detection accuracy,reducing False Positives(FPs),and enhancing network security.To develop robust decision-making capabilities for real-world network disruptions and hostile activity,NIDS agents are trained in adversarial scenarios to monitor the current state and notify management of any abnormal or malicious activity.The accuracy and timeliness of the IDS were crucial to the network’s availability and reliability at this time.This paper analyzes ARL applications in NIDS,revealing State-of-The-Art(SoTA)methodology,issues,and future research prospects.This includes Reinforcement Machine Learning(RML)-based NIDS,which enables an agent to interact with the environment to achieve a goal,andDeep Reinforcement Learning(DRL)-based NIDS,which can solve complex decision-making problems.Additionally,this survey study addresses cybersecurity adversarial circumstances and their importance for ARL and NIDS.Architectural design,RL algorithms,feature representation,and training methodologies are examined in the ARL-NIDS study.This comprehensive study evaluates ARL for intelligent NIDS research,benefiting cybersecurity researchers,practitioners,and policymakers.The report promotes cybersecurity defense research and innovation.展开更多
Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic top...Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments.展开更多
Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrain...Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively.展开更多
At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown ...At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems.展开更多
Theintegration of human factors into artificial intelligence(AI)systems has emerged as a critical research frontier,particularly in reinforcement learning(RL),where human-AI interaction(HAII)presents both opportunitie...Theintegration of human factors into artificial intelligence(AI)systems has emerged as a critical research frontier,particularly in reinforcement learning(RL),where human-AI interaction(HAII)presents both opportunities and challenges.As RL continues to demonstrate remarkable success in model-free and partially observable environments,its real-world deployment increasingly requires effective collaboration with human operators and stakeholders.This article systematically examines HAII techniques in RL through both theoretical analysis and practical case studies.We establish a conceptual framework built upon three fundamental pillars of effective human-AI collaboration:computational trust modeling,system usability,and decision understandability.Our comprehensive review organizes HAII methods into five key categories:(1)learning from human feedback,including various shaping approaches;(2)learning from human demonstration through inverse RL and imitation learning;(3)shared autonomy architectures for dynamic control allocation;(4)human-in-the-loop querying strategies for active learning;and(5)explainable RL techniques for interpretable policy generation.Recent state-of-the-art works are critically reviewed,with particular emphasis on advances incorporating large language models in human-AI interaction research.To illustrate some concepts,we present three detailed case studies:an empirical trust model for farmers adopting AI-driven agricultural management systems,the implementation of ethical constraints in roboticmotion planning through human-guided RL,and an experimental investigation of human trust dynamics using a multi-armed bandit paradigm.These applications demonstrate how HAII principles can enhance RL systems’practical utility while bridging the gap between theoretical RL and real-world human-centered applications,ultimately contributing to more deployable and socially beneficial intelligent systems.展开更多
To address the high costs and operational instability of distribution networks caused by the large-scale integration of distributed energy resources(DERs)(such as photovoltaic(PV)systems,wind turbines(WT),and energy s...To address the high costs and operational instability of distribution networks caused by the large-scale integration of distributed energy resources(DERs)(such as photovoltaic(PV)systems,wind turbines(WT),and energy storage(ES)devices),and the increased grid load fluctuations and safety risks due to uncoordinated electric vehicles(EVs)charging,this paper proposes a novel dual-scale hierarchical collaborative optimization strategy.This strategy decouples system-level economic dispatch from distributed EV agent control,effectively solving the resource coordination conflicts arising from the high computational complexity,poor scalability of existing centralized optimization,or the reliance on local information decision-making in fully decentralized frameworks.At the lower level,an EV charging and discharging model with a hybrid discrete-continuous action space is established,and optimized using an improved Parameterized Deep Q-Network(PDQN)algorithm,which directly handles mode selection and power regulation while embedding physical constraints to ensure safety.At the upper level,microgrid(MG)operators adopt a dynamic pricing strategy optimized through Deep Reinforcement Learning(DRL)to maximize economic benefits and achieve peak-valley shaving.Simulation results show that the proposed strategy outperforms traditional methods,reducing the total operating cost of the MG by 21.6%,decreasing the peak-to-valley load difference by 33.7%,reducing the number of voltage limit violations by 88.9%,and lowering the average electricity cost for EV users by 15.2%.This method brings a win-win result for operators and users,providing a reliable and efficient scheduling solution for distribution networks with high renewable energy penetration rates.展开更多
Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision...Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training.展开更多
As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,t...As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,the heterogeneity of aircraft,partial observability,and dynamic uncertainty in operational airspace pose significant challenges to autonomous collision avoidance using traditional methods.To address these issues,this paper proposes an adaptive collision avoidance approach for UAVs based on deep reinforcement learning.First,a unified uncertainty model incorporating dynamic wind fields is constructed to capture the complexity of joint operational environments.Then,to effectively handle the heterogeneity between manned and unmanned aircraft and the limitations of dynamic observations,a sector-based partial observation mechanism is designed.A Dynamic Threat Prioritization Assessment algorithm is also proposed to evaluate potential collision threats from multiple dimensions,including time to closest approach,minimum separation distance,and aircraft type.Furthermore,a Hierarchical Prioritized Experience Replay(HPER)mechanism is introduced,which classifies experience samples into high,medium,and low priority levels to preferentially sample critical experiences,thereby improving learning efficiency and accelerating policy convergence.Simulation results show that the proposed HPER-D3QN algorithm outperforms existing methods in terms of learning speed,environmental adaptability,and robustness,significantly enhancing collision avoidance performance and convergence rate.Finally,transfer experiments on a high-fidelity battlefield airspace simulation platform validate the proposed method's deployment potential and practical applicability in complex,real-world joint operational scenarios.展开更多
Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressin...Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressing challenges in autonomous navigation.Nonetheless,challenges persist,including getting stuck in local optima,consuming excessive computations during action space exploration,and neglecting deterministic experience.This paper proposes a noise-driven enhancement strategy.In accordance with the overall learning phases,a global noise control method is designed,while a differentiated local noise control method is developed by analyzing the exploration demands of four typical situations encountered by UAV during navigation.Both methods are integrated into a dual-model for noise control to regulate action space exploration.Furthermore,noise dual experience replay buffers are designed to optimize the rational utilization of both deterministic and noisy experience.In uncertain environments,based on the Twin Delay Deep Deterministic Policy Gradient(TD3)algorithm with Long Short-Term Memory(LSTM)network and Priority Experience Replay(PER),a Noise-Driven Enhancement Priority Memory TD3(NDE-PMTD3)is developed.We established a simulation environment to compare different algorithms,and the performance of the algorithms is analyzed in various scenarios.The training results indicate that the proposed algorithm accelerates the convergence speed and enhances the convergence stability.In test experiments,the proposed algorithm successfully and efficiently performs autonomous navigation tasks in diverse environments,demonstrating superior generalization results.展开更多
With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier...With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.展开更多
As the types of traffic requests increase,the elastic optical network(EON)is considered as a promising architecture to carry multiple types of traffic requests simultaneously,including immediate reservation(IR)and adv...As the types of traffic requests increase,the elastic optical network(EON)is considered as a promising architecture to carry multiple types of traffic requests simultaneously,including immediate reservation(IR)and advance reservation(AR).Various resource allocation schemes for IR/AR requests have been designed in EON to reduce bandwidth blocking probability(BBP).However,these schemes do not consider different transmission requirements of IR requests and cannot maintain a low BBP for high-priority requests.In this paper,multi-priority is considered in the hybrid IR/AR request scenario.We modify the asynchronous advantage actor critic(A3C)model and propose an A3C-assisted priority resource allocation(APRA)algorithm.The APRA integrates priority and transmission quality of IR requests to design the A3C reward function,then dynamically allocates dedicated resources for different IR requests according to the time-varying requirements.By maximizing the reward,the transmission quality of IR requests can be matched with the priority,and lower BBP for high-priority IR requests can be ensured.Simulation results show that the APRA reduces the BBP of high-priority IR requests from 0.0341 to0.0138,and the overall network operation gain is improved by 883 compared to the scheme without considering the priority.展开更多
To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,w...To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,we proposed a hybrid framework integrating adaptive reinforcement learning(RL),multi-modal perception fusion,and enhanced pigeon flock optimization(PFO)with curiosity-driven exploration to enable robust autonomous and formation control.The framework leverages meta-learning to optimize RL policies for real-time adaptation,fuses sensor data for precise state estimation,and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas.For swarms of 10–30 UAVs,it achieves 34%faster convergence,61%reduced stability root mean square error(RMSE),88%fewer collisions and 85.6%–92.3%success rates in target detection and encirclement,outperforming standard multi-agent RL,pure PFO,and single-modality RL.Three-dimensional trajectory visualizations confirm cohesive formations,collision-free maneuvers,and efficient exploration in urban search-and-rescue scenarios.Innovations include meta-RL for rapid adaptation,multi-modal fusion for robust perception,and curiosity-driven PFO for scalable,decentralized control,advancing real-world multi-UAV swarm autonomy and coordination.展开更多
Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)meth...Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)method that reformulates the task as a Markov Decision Process(MDP)through the integration of Monte Carlo Tree Search(MCTS)and deep reinforcement learning(DRL).The framework employs an action space of 25 enhancement operators,strategically grouped for basic attribute adjustment,color component balance,correction,and deblurring.Exploration within MCTS is guided by a dual-branch convolutional network,enabling intelligent sequential operator selection.Our core contributions include:(1)a multimodal state representation combining CIELab color histograms with deep perceptual features,(2)a dual-objective reward mechanism optimizing chromatic fidelity and perceptual consistency,and(3)an alternating training strategy co-optimizing enhancement sequences and network parameters.We further propose two inference schemes:an MCTS-based approach prioritizing accuracy at higher computational cost,and an efficient network policy enabling real-time processing with minimal quality loss.Comprehensive evaluations on the UIEB Dataset and Color correction and haze removal comparisons on the U45 Dataset demonstrate AquaTree’s superiority,significantly outperforming nine state-of-the-art methods across five established underwater image quality metrics.展开更多
While reinforcement learning-based underwater acoustic adaptive modulation shows promise for enabling environment-adaptive communication as supported by extensive simulation-based research,its practical performance re...While reinforcement learning-based underwater acoustic adaptive modulation shows promise for enabling environment-adaptive communication as supported by extensive simulation-based research,its practical performance remains underexplored in field investigations.To evaluate the practical applicability of this emerging technique in adverse shallow sea channels,a field experiment was conducted using three communication modes:orthogonal frequency division multiplexing(OFDM),M-ary frequency-shift keying(MFSK),and direct sequence spread spectrum(DSSS)for reinforcement learning-driven adaptive modulation.Specifically,a Q-learning method is used to select the optimal modulation mode according to the channel quality quantified by signal-to-noise ratio,multipath spread length,and Doppler frequency offset.Experimental results demonstrate that the reinforcement learning-based adaptive modulation scheme outperformed fixed threshold detection in terms of total throughput and average bit error rate,surpassing conventional adaptive modulation strategies.展开更多
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj...This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.展开更多
The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability an...The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments.展开更多
In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies an...In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies and trajectory planning and often perform poorly in complex environments.To improve the UAV-environment interaction efficiency,this study proposes a multi-UAV integrated navigation algorithm based on Deep Reinforcement Learning(DRL).This algorithm integrates the Inertial Navigation System(INS),Global Navigation Satellite System(GNSS),and Visual Navigation System(VNS)for comprehensive information fusion.Specifically,an improved multi-UAV integrated navigation algorithm called Information Fusion with MultiAgent Deep Deterministic Policy Gradient(IF-MADDPG)was developed.This algorithm enables UAVs to learn collaboratively and optimize their flight trajectories in real time.Through simulations and experiments,test scenarios in GNSS-denied environments were constructed to evaluate the effectiveness of the algorithm.The experimental results demonstrate that the IF-MADDPG algorithm significantly enhances the collaborative navigation capabilities of multiple UAVs in formation maintenance and GNSS-denied environments.Additionally,it has advantages in terms of mission completion time.This study provides a novel approach for efficient collaboration in multi-UAV systems,which significantly improves the robustness and adaptability of navigation systems.展开更多
To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforce...To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforcement learning with rule-based decision-making methods.A risk assessment model for lane-change maneuvers considering uncertain predictions of surrounding vehicles is established as a safety filter to improve learning efficiency while correcting dangerous actions for safety enhancement.On this basis,a Risk-fused DDQN is constructed utilizing the model-based risk assessment and supervision mechanism.The proposed reinforcement learning algorithm sets up a separate experience buffer for dangerous trials and punishes such actions,which is shown to improve the sampling efficiency and training outcomes.Compared with conventional DDQN methods,the proposed algorithm improves the convergence value of cumulated reward by 7.6%and 2.2%in the two constructed scenarios in the simulation study and reduces the number of training episodes by 52.2%and 66.8%respectively.The success rate of lane change is improved by 57.3%while the time headway is increased at least by 16.5%in real vehicle tests,which confirms the higher training efficiency,scenario adaptability,and security of the proposed Risk-fused DDQN.展开更多
基金funded by the Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah,Saudi Arabia,grant number RG-2-611-42(A.O.A.).
文摘Wi-Fi technology has evolved significantly since its introduction in 1997,advancing to Wi-Fi 6 as the latest standard,with Wi-Fi 7 currently under development.Despite these advancements,integrating machine learning into Wi-Fi networks remains challenging,especially in decentralized environments with multiple access points(mAPs).This paper is a short review that summarizes the potential applications of federated reinforcement learning(FRL)across eight key areas of Wi-Fi functionality,including channel access,link adaptation,beamforming,multi-user transmissions,channel bonding,multi-link operation,spatial reuse,and multi-basic servic set(multi-BSS)coordination.FRL is highlighted as a promising framework for enabling decentralized training and decision-making while preserving data privacy.To illustrate its role in practice,we present a case study on link activation in a multi-link operation(MLO)environment with multiple APs.Through theoretical discussion and simulation results,the study demonstrates how FRL can improve performance and reliability,paving the way for more adaptive and collaborative Wi-Fi networks in the era of Wi-Fi 7 and beyond.
基金co-supported by the National Natural Science Foundation of China(Nos.72371052 and 71871042).
文摘Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target.
文摘Adversarial Reinforcement Learning(ARL)models for intelligent devices and Network Intrusion Detection Systems(NIDS)improve systemresilience against sophisticated cyber-attacks.As a core component of ARL,Adversarial Training(AT)enables NIDS agents to discover and prevent newattack paths by exposing them to competing examples,thereby increasing detection accuracy,reducing False Positives(FPs),and enhancing network security.To develop robust decision-making capabilities for real-world network disruptions and hostile activity,NIDS agents are trained in adversarial scenarios to monitor the current state and notify management of any abnormal or malicious activity.The accuracy and timeliness of the IDS were crucial to the network’s availability and reliability at this time.This paper analyzes ARL applications in NIDS,revealing State-of-The-Art(SoTA)methodology,issues,and future research prospects.This includes Reinforcement Machine Learning(RML)-based NIDS,which enables an agent to interact with the environment to achieve a goal,andDeep Reinforcement Learning(DRL)-based NIDS,which can solve complex decision-making problems.Additionally,this survey study addresses cybersecurity adversarial circumstances and their importance for ARL and NIDS.Architectural design,RL algorithms,feature representation,and training methodologies are examined in the ARL-NIDS study.This comprehensive study evaluates ARL for intelligent NIDS research,benefiting cybersecurity researchers,practitioners,and policymakers.The report promotes cybersecurity defense research and innovation.
基金funded by Hung Yen University of Technology and Education under grand number UTEHY.L.2025.62.
文摘Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments.
基金supported by Key Science and Technology Program of Henan Province,China(Grant Nos.242102210147,242102210027)Fujian Province Young and Middle aged Teacher Education Research Project(Science and Technology Category)(No.JZ240101)(Corresponding author:Dong Yuan).
文摘Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively.
文摘At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems.
基金funded by the U.S.Department of Education under Grant Number ED#P116S210005the National Science Foundation under Grant Numbers 2226936 and 2420405.
文摘Theintegration of human factors into artificial intelligence(AI)systems has emerged as a critical research frontier,particularly in reinforcement learning(RL),where human-AI interaction(HAII)presents both opportunities and challenges.As RL continues to demonstrate remarkable success in model-free and partially observable environments,its real-world deployment increasingly requires effective collaboration with human operators and stakeholders.This article systematically examines HAII techniques in RL through both theoretical analysis and practical case studies.We establish a conceptual framework built upon three fundamental pillars of effective human-AI collaboration:computational trust modeling,system usability,and decision understandability.Our comprehensive review organizes HAII methods into five key categories:(1)learning from human feedback,including various shaping approaches;(2)learning from human demonstration through inverse RL and imitation learning;(3)shared autonomy architectures for dynamic control allocation;(4)human-in-the-loop querying strategies for active learning;and(5)explainable RL techniques for interpretable policy generation.Recent state-of-the-art works are critically reviewed,with particular emphasis on advances incorporating large language models in human-AI interaction research.To illustrate some concepts,we present three detailed case studies:an empirical trust model for farmers adopting AI-driven agricultural management systems,the implementation of ethical constraints in roboticmotion planning through human-guided RL,and an experimental investigation of human trust dynamics using a multi-armed bandit paradigm.These applications demonstrate how HAII principles can enhance RL systems’practical utility while bridging the gap between theoretical RL and real-world human-centered applications,ultimately contributing to more deployable and socially beneficial intelligent systems.
基金supported in part by the Research on Key Technologies for the Development of an Active Balancing Cooperative Control Systemfor Distribution Networks and the National Natural Science Foundation of China under Grant 521532240029,Grant 62303006.
文摘To address the high costs and operational instability of distribution networks caused by the large-scale integration of distributed energy resources(DERs)(such as photovoltaic(PV)systems,wind turbines(WT),and energy storage(ES)devices),and the increased grid load fluctuations and safety risks due to uncoordinated electric vehicles(EVs)charging,this paper proposes a novel dual-scale hierarchical collaborative optimization strategy.This strategy decouples system-level economic dispatch from distributed EV agent control,effectively solving the resource coordination conflicts arising from the high computational complexity,poor scalability of existing centralized optimization,or the reliance on local information decision-making in fully decentralized frameworks.At the lower level,an EV charging and discharging model with a hybrid discrete-continuous action space is established,and optimized using an improved Parameterized Deep Q-Network(PDQN)algorithm,which directly handles mode selection and power regulation while embedding physical constraints to ensure safety.At the upper level,microgrid(MG)operators adopt a dynamic pricing strategy optimized through Deep Reinforcement Learning(DRL)to maximize economic benefits and achieve peak-valley shaving.Simulation results show that the proposed strategy outperforms traditional methods,reducing the total operating cost of the MG by 21.6%,decreasing the peak-to-valley load difference by 33.7%,reducing the number of voltage limit violations by 88.9%,and lowering the average electricity cost for EV users by 15.2%.This method brings a win-win result for operators and users,providing a reliable and efficient scheduling solution for distribution networks with high renewable energy penetration rates.
基金funded by the Beijing Engineering Research Center of Electric Rail Transportation.
文摘Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training.
基金supported by the National Key Research and Development Program of China(No.2022YFB4300902).
文摘As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,the heterogeneity of aircraft,partial observability,and dynamic uncertainty in operational airspace pose significant challenges to autonomous collision avoidance using traditional methods.To address these issues,this paper proposes an adaptive collision avoidance approach for UAVs based on deep reinforcement learning.First,a unified uncertainty model incorporating dynamic wind fields is constructed to capture the complexity of joint operational environments.Then,to effectively handle the heterogeneity between manned and unmanned aircraft and the limitations of dynamic observations,a sector-based partial observation mechanism is designed.A Dynamic Threat Prioritization Assessment algorithm is also proposed to evaluate potential collision threats from multiple dimensions,including time to closest approach,minimum separation distance,and aircraft type.Furthermore,a Hierarchical Prioritized Experience Replay(HPER)mechanism is introduced,which classifies experience samples into high,medium,and low priority levels to preferentially sample critical experiences,thereby improving learning efficiency and accelerating policy convergence.Simulation results show that the proposed HPER-D3QN algorithm outperforms existing methods in terms of learning speed,environmental adaptability,and robustness,significantly enhancing collision avoidance performance and convergence rate.Finally,transfer experiments on a high-fidelity battlefield airspace simulation platform validate the proposed method's deployment potential and practical applicability in complex,real-world joint operational scenarios.
基金the Collaborative Innovation Project of Shanghai,China for the financial support。
文摘Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressing challenges in autonomous navigation.Nonetheless,challenges persist,including getting stuck in local optima,consuming excessive computations during action space exploration,and neglecting deterministic experience.This paper proposes a noise-driven enhancement strategy.In accordance with the overall learning phases,a global noise control method is designed,while a differentiated local noise control method is developed by analyzing the exploration demands of four typical situations encountered by UAV during navigation.Both methods are integrated into a dual-model for noise control to regulate action space exploration.Furthermore,noise dual experience replay buffers are designed to optimize the rational utilization of both deterministic and noisy experience.In uncertain environments,based on the Twin Delay Deep Deterministic Policy Gradient(TD3)algorithm with Long Short-Term Memory(LSTM)network and Priority Experience Replay(PER),a Noise-Driven Enhancement Priority Memory TD3(NDE-PMTD3)is developed.We established a simulation environment to compare different algorithms,and the performance of the algorithms is analyzed in various scenarios.The training results indicate that the proposed algorithm accelerates the convergence speed and enhances the convergence stability.In test experiments,the proposed algorithm successfully and efficiently performs autonomous navigation tasks in diverse environments,demonstrating superior generalization results.
文摘With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.
文摘As the types of traffic requests increase,the elastic optical network(EON)is considered as a promising architecture to carry multiple types of traffic requests simultaneously,including immediate reservation(IR)and advance reservation(AR).Various resource allocation schemes for IR/AR requests have been designed in EON to reduce bandwidth blocking probability(BBP).However,these schemes do not consider different transmission requirements of IR requests and cannot maintain a low BBP for high-priority requests.In this paper,multi-priority is considered in the hybrid IR/AR request scenario.We modify the asynchronous advantage actor critic(A3C)model and propose an A3C-assisted priority resource allocation(APRA)algorithm.The APRA integrates priority and transmission quality of IR requests to design the A3C reward function,then dynamically allocates dedicated resources for different IR requests according to the time-varying requirements.By maximizing the reward,the transmission quality of IR requests can be matched with the priority,and lower BBP for high-priority IR requests can be ensured.Simulation results show that the APRA reduces the BBP of high-priority IR requests from 0.0341 to0.0138,and the overall network operation gain is improved by 883 compared to the scheme without considering the priority.
基金supported by the National Natural Science Foundation of China(No.62350048)。
文摘To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,we proposed a hybrid framework integrating adaptive reinforcement learning(RL),multi-modal perception fusion,and enhanced pigeon flock optimization(PFO)with curiosity-driven exploration to enable robust autonomous and formation control.The framework leverages meta-learning to optimize RL policies for real-time adaptation,fuses sensor data for precise state estimation,and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas.For swarms of 10–30 UAVs,it achieves 34%faster convergence,61%reduced stability root mean square error(RMSE),88%fewer collisions and 85.6%–92.3%success rates in target detection and encirclement,outperforming standard multi-agent RL,pure PFO,and single-modality RL.Three-dimensional trajectory visualizations confirm cohesive formations,collision-free maneuvers,and efficient exploration in urban search-and-rescue scenarios.Innovations include meta-RL for rapid adaptation,multi-modal fusion for robust perception,and curiosity-driven PFO for scalable,decentralized control,advancing real-world multi-UAV swarm autonomy and coordination.
基金supported by theHubei Provincial Technology Innovation Special Project and the Natural Science Foundation of Hubei Province under Grants 2023BEB024,2024AFC066,respectively.
文摘Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)method that reformulates the task as a Markov Decision Process(MDP)through the integration of Monte Carlo Tree Search(MCTS)and deep reinforcement learning(DRL).The framework employs an action space of 25 enhancement operators,strategically grouped for basic attribute adjustment,color component balance,correction,and deblurring.Exploration within MCTS is guided by a dual-branch convolutional network,enabling intelligent sequential operator selection.Our core contributions include:(1)a multimodal state representation combining CIELab color histograms with deep perceptual features,(2)a dual-objective reward mechanism optimizing chromatic fidelity and perceptual consistency,and(3)an alternating training strategy co-optimizing enhancement sequences and network parameters.We further propose two inference schemes:an MCTS-based approach prioritizing accuracy at higher computational cost,and an efficient network policy enabling real-time processing with minimal quality loss.Comprehensive evaluations on the UIEB Dataset and Color correction and haze removal comparisons on the U45 Dataset demonstrate AquaTree’s superiority,significantly outperforming nine state-of-the-art methods across five established underwater image quality metrics.
基金funding from the National Key Research and Development Program of China(No.2018YFE0110000)the National Natural Science Foundation of China(No.11274259,No.11574258)the Science and Technology Commission Foundation of Shanghai(21DZ1205500)in support of the present research.
文摘While reinforcement learning-based underwater acoustic adaptive modulation shows promise for enabling environment-adaptive communication as supported by extensive simulation-based research,its practical performance remains underexplored in field investigations.To evaluate the practical applicability of this emerging technique in adverse shallow sea channels,a field experiment was conducted using three communication modes:orthogonal frequency division multiplexing(OFDM),M-ary frequency-shift keying(MFSK),and direct sequence spread spectrum(DSSS)for reinforcement learning-driven adaptive modulation.Specifically,a Q-learning method is used to select the optimal modulation mode according to the channel quality quantified by signal-to-noise ratio,multipath spread length,and Doppler frequency offset.Experimental results demonstrate that the reinforcement learning-based adaptive modulation scheme outperformed fixed threshold detection in terms of total throughput and average bit error rate,surpassing conventional adaptive modulation strategies.
基金supported by the National Natural Science Foundation of China(Nos.12272104,U22B2013).
文摘This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.
基金supported by National Natural Science Foundation of China under Grant No.62372110Fujian Provincial Natural Science of Foundation under Grants 2023J02008,2024H0009.
文摘The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments.
基金co-supported by the National Natural Science Foundation of China(Nos.92371201 and 52192633)the Natural Science Foundation of Shaanxi Province of China(No.2022JC-03)the Aeronautical Science Foundation of China(No.ASFC-20220019070002)。
文摘In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies and trajectory planning and often perform poorly in complex environments.To improve the UAV-environment interaction efficiency,this study proposes a multi-UAV integrated navigation algorithm based on Deep Reinforcement Learning(DRL).This algorithm integrates the Inertial Navigation System(INS),Global Navigation Satellite System(GNSS),and Visual Navigation System(VNS)for comprehensive information fusion.Specifically,an improved multi-UAV integrated navigation algorithm called Information Fusion with MultiAgent Deep Deterministic Policy Gradient(IF-MADDPG)was developed.This algorithm enables UAVs to learn collaboratively and optimize their flight trajectories in real time.Through simulations and experiments,test scenarios in GNSS-denied environments were constructed to evaluate the effectiveness of the algorithm.The experimental results demonstrate that the IF-MADDPG algorithm significantly enhances the collaborative navigation capabilities of multiple UAVs in formation maintenance and GNSS-denied environments.Additionally,it has advantages in terms of mission completion time.This study provides a novel approach for efficient collaboration in multi-UAV systems,which significantly improves the robustness and adaptability of navigation systems.
基金Supported by National Key Research and Development Program of China(Grant No.2022YFE0117100)National Science Foundation of China(Grant No.52102468,52325212)Fundamental Research Funds for the Central Universities。
文摘To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforcement learning with rule-based decision-making methods.A risk assessment model for lane-change maneuvers considering uncertain predictions of surrounding vehicles is established as a safety filter to improve learning efficiency while correcting dangerous actions for safety enhancement.On this basis,a Risk-fused DDQN is constructed utilizing the model-based risk assessment and supervision mechanism.The proposed reinforcement learning algorithm sets up a separate experience buffer for dangerous trials and punishes such actions,which is shown to improve the sampling efficiency and training outcomes.Compared with conventional DDQN methods,the proposed algorithm improves the convergence value of cumulated reward by 7.6%and 2.2%in the two constructed scenarios in the simulation study and reduces the number of training episodes by 52.2%and 66.8%respectively.The success rate of lane change is improved by 57.3%while the time headway is increased at least by 16.5%in real vehicle tests,which confirms the higher training efficiency,scenario adaptability,and security of the proposed Risk-fused DDQN.