1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused...1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused on the single-constraint issue,overlooking the more common multi-constraint setting which involves extensive computations and combinatorial optimization of multiple Lagrange multipliers.展开更多
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj...This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.展开更多
To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforce...To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforcement learning with rule-based decision-making methods.A risk assessment model for lane-change maneuvers considering uncertain predictions of surrounding vehicles is established as a safety filter to improve learning efficiency while correcting dangerous actions for safety enhancement.On this basis,a Risk-fused DDQN is constructed utilizing the model-based risk assessment and supervision mechanism.The proposed reinforcement learning algorithm sets up a separate experience buffer for dangerous trials and punishes such actions,which is shown to improve the sampling efficiency and training outcomes.Compared with conventional DDQN methods,the proposed algorithm improves the convergence value of cumulated reward by 7.6%and 2.2%in the two constructed scenarios in the simulation study and reduces the number of training episodes by 52.2%and 66.8%respectively.The success rate of lane change is improved by 57.3%while the time headway is increased at least by 16.5%in real vehicle tests,which confirms the higher training efficiency,scenario adaptability,and security of the proposed Risk-fused DDQN.展开更多
Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision u...Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value.展开更多
Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To sa...Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link.展开更多
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-...Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.展开更多
Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous...Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous control process of SMR can be divided into three stages,say,state diagnosis,autonomous decision-making and coordinated control.In this paper,the autonomous state recognition and task planning of unmanned SMR are investigated.An operating condition recognition method based on the knowledge base of SMR operation is proposed by using the artificial neural network(ANN)technology,which constructs a basis for the state judgment of intelligent reactor control path planning.An improved reinforcement learning path planning algorithm is utilized to implement the path transfer decision-makingThis algorithm performs condition transitions with minimal cost under specified modes.In summary,the full range control path intelligent decision-planning technology of SMR is realized,thus provides some theoretical basis for the design and build of unmanned SMR in the future.展开更多
The exponential growth of Internet ofThings(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet ...The exponential growth of Internet ofThings(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet the stringent latency requirements of modern IoT systems,while pure edge computing faces resource constraints that limit processing capabilities.This paper addresses these challenges by proposing a novel Deep Reinforcement Learning(DRL)-enhanced priority-based scheduling framework for hybrid edge-cloud computing environments.Our approach integrates adaptive priority assignment with a two-level concurrency control protocol that ensures both optimal performance and data consistency.The framework introduces three key innovations:(1)a DRL-based dynamic priority assignmentmechanism that learns fromsystem behavior,(2)a hybrid concurrency control protocol combining local edge validation with global cloud coordination,and(3)an integrated mathematical model that formalizes sensor-driven transactions across edge-cloud architectures.Extensive simulations across diverse workload scenarios demonstrate significant quantitative improvements:40%latency reduction,25%throughput increase,85%resource utilization(compared to 60%for heuristicmethods),40%reduction in energy consumption(300 vs.500 J per task),and 50%improvement in scalability factor(1.8 vs.1.2 for EDF)compared to state-of-the-art heuristic and meta-heuristic approaches.These results establish the framework as a robust solution for large-scale IoT and autonomous applications requiring real-time processing with consistency guarantees.展开更多
The knapsack problem is a classical combinatorial optimization problem widely encountered in areas such as logistics,resource allocation,and portfolio optimization.Traditional methods,including dynamic program-ming(DP...The knapsack problem is a classical combinatorial optimization problem widely encountered in areas such as logistics,resource allocation,and portfolio optimization.Traditional methods,including dynamic program-ming(DP)and greedy algorithms,have been effective in solving small problem instances but often struggle with scalability and efficiency as the problem size increases.DP,for instance,has exponential time complexity and can become computationally prohibitive for large problem instances.On the other hand,greedy algorithms offer faster solutions but may not always yield the optimal results,especially when the problem involves complex constraints or large numbers of items.This paper introduces a novel reinforcement learning(RL)approach to solve the knapsack problem by enhancing the state representation within the learning environment.We propose a representation where item weights and volumes are expressed as ratios relative to the knapsack’s capacity,and item values are normalized to represent their percentage of the total value across all items.This novel state modification leads to a 5%improvement in accuracy compared to the state-of-the-art RL-based algorithms,while significantly reducing execution time.Our RL-based method outperforms DP by over 9000 times in terms of speed,making it highly scalable for larger problem instances.Furthermore,we improve the performance of the RL model by incorporating Noisy layers into the neural network architecture.The addition of Noisy layers enhances the exploration capabilities of the agent,resulting in an additional accuracy boost of 0.2%–0.5%.The results demonstrate that our approach not only outperforms existing RL techniques,such as the Transformer model in terms of accuracy,but also provides a substantial improvement than DP in computational efficiency.This combination of enhanced accuracy and speed presents a promising solution for tackling large-scale optimization problems in real-world applications,where both precision and time are critical factors.展开更多
The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability an...The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments.展开更多
Edge computing(EC)combined with the Internet of Things(IoT)provides a scalable and efficient solution for smart homes.Therapid proliferation of IoT devices poses real-time data processing and security challenges.EC ha...Edge computing(EC)combined with the Internet of Things(IoT)provides a scalable and efficient solution for smart homes.Therapid proliferation of IoT devices poses real-time data processing and security challenges.EC has become a transformative paradigm for addressing these challenges,particularly in intrusion detection and anomaly mitigation.The widespread connectivity of IoT edge networks has exposed them to various security threats,necessitating robust strategies to detect malicious activities.This research presents a privacy-preserving federated anomaly detection framework combined with Bayesian game theory(BGT)and double deep Q-learning(DDQL).The proposed framework integrates BGT to model attacker and defender interactions for dynamic threat level adaptation and resource availability.It also models a strategic layout between attackers and defenders that takes into account uncertainty.DDQL is incorporated to optimize decision-making and aids in learning optimal defense policies at the edge,thereby ensuring policy and decision optimization.Federated learning(FL)enables decentralized and unshared anomaly detection for sensitive data between devices.Data collection has been performed from various sensors in a real-time EC-IoT network to identify irregularities that occurred due to different attacks.The results reveal that the proposed model achieves high detection accuracy of up to 98%while maintaining low resource consumption.This study demonstrates the synergy between game theory and FL to strengthen anomaly detection in EC-IoT networks.展开更多
Current damage detection methods based on model updating and sensitivity Jacobian matrixes show a low convergence ratio and computational efficiency for online calculations.The aim of this paper is to construct a real...Current damage detection methods based on model updating and sensitivity Jacobian matrixes show a low convergence ratio and computational efficiency for online calculations.The aim of this paper is to construct a real-time automated damage detection method by developing a theory-assisted adaptive mutiagent twin delayed deep deterministic(TA2-MATD3)policy gradient algorithm.First,the theoretical framework of reinforcement-learning-driven damage detection is established.To address the disadvantages of traditional mutiagent twin delayed deep deterministic(MATD3)method,the theory-assisted mechanism and the adaptive experience playback mechanism are introduced.Moreover,a historical residential house built in 1889 was taken as an example,using its 12-month structural health monitoring data.TA2-MATD3 was compared with existing damage detection methods in terms of the convergence ratio,online computing efficiency,and damage detection accuracy.The results show that the computational efficiency of TA2-MATD3 is approximately 117–160 times that of the traditional methods.The convergence ratio of damage detection on the training set is approximately 97%,and that on the test set is in the range of 86.2%–91.9%.In addition,the main apparent damages found in the field survey were identified by TA2-MATD3.The results indicate that the proposed method can significantly improve the online computing efficiency and damage detection accuracy.This research can provide novel perspectives for the use of reinforcement learning methods to conduct damage detection in online structural health monitoring.展开更多
Unmanned Aerial Vehicle(UAV)stands as a burgeoning electric transportation carrier,holding substantial promise for the logistics sector.A reinforcement learning framework Centralized-S Proximal Policy Optimization(C-S...Unmanned Aerial Vehicle(UAV)stands as a burgeoning electric transportation carrier,holding substantial promise for the logistics sector.A reinforcement learning framework Centralized-S Proximal Policy Optimization(C-SPPO)based on centralized decision process and considering policy entropy(S)is proposed.The proposed framework aims to plan the best scheduling scheme with the objective of minimizing both the timeout of order requests and the flight impact of UAVs that may lead to conflicts.In this framework,the intents of matching act are generated through the observations of UAV agents,and the ultimate conflict-free matching results are output under the guidance of a centralized decision maker.Concurrently,a pre-activation operation is introduced to further enhance the cooperation among UAV agents.Simulation experiments based on real-world data from New York City are conducted.The results indicate that the proposed CSPPO outperforms the baseline algorithms in the Average Delay Time(ADT),the Maximum Delay Time(MDT),the Order Delay Rate(ODR),the Average Flight Distance(AFD),and the Flight Impact Ratio(FIR).Furthermore,the framework demonstrates scalability to scenarios of different sizes without requiring additional training.展开更多
Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic top...Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments.展开更多
We demonstrate a reinforcement learning(RL)-based control framework for optimizing evaporative cooling in the preparation of strongly interacting degenerate Fermi gases of 6Li.Using a Soft Actor-Critic(SAC)algorithm,t...We demonstrate a reinforcement learning(RL)-based control framework for optimizing evaporative cooling in the preparation of strongly interacting degenerate Fermi gases of 6Li.Using a Soft Actor-Critic(SAC)algorithm,the system autonomously explores a high-dimensional parameter space to learn optimal cooling trajectories.Compared to conventional exponential ramps,our method achieves up to 130%improvement in atomic density within 0.5 second,revealing non-trivial control strategies that balance fast evaporation and thermalization.While our current optimization focuses on the evaporation stage,future integration of other cooling stages,such as gray molasses cooling,could further extend RL to the full preparation pipeline.Our result highlights the promise of RL as a general tool for closed-loop quantum control and automated calibration in complex atomic physics experiments.展开更多
Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrain...Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively.展开更多
In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies an...In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies and trajectory planning and often perform poorly in complex environments.To improve the UAV-environment interaction efficiency,this study proposes a multi-UAV integrated navigation algorithm based on Deep Reinforcement Learning(DRL).This algorithm integrates the Inertial Navigation System(INS),Global Navigation Satellite System(GNSS),and Visual Navigation System(VNS)for comprehensive information fusion.Specifically,an improved multi-UAV integrated navigation algorithm called Information Fusion with MultiAgent Deep Deterministic Policy Gradient(IF-MADDPG)was developed.This algorithm enables UAVs to learn collaboratively and optimize their flight trajectories in real time.Through simulations and experiments,test scenarios in GNSS-denied environments were constructed to evaluate the effectiveness of the algorithm.The experimental results demonstrate that the IF-MADDPG algorithm significantly enhances the collaborative navigation capabilities of multiple UAVs in formation maintenance and GNSS-denied environments.Additionally,it has advantages in terms of mission completion time.This study provides a novel approach for efficient collaboration in multi-UAV systems,which significantly improves the robustness and adaptability of navigation systems.展开更多
In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Comm...In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Communication Node(PCN).Specifically,all nodes transmit to the UAV exploiting uplink non-Orthogonal Multiple Access(NOMA),while the UAV performs covert transmission to the PCN at the same frequency.To minimize the average age of covert information,we formulate a joint optimization problem of UAV trajectory and power allocation designing subject to multi-dimensional constraints including covertness demand,communication quality requirement,maximum flying speed,and the maximum available resources.To address this problem,we embed Signomial Programming(SP)into Deep Reinforcement Learning(DRL)and propose a DRL framework capable of handling the constrained Markov decision processes,named SP embedded Soft Actor-Critic(SSAC).By adopting SSAC,we achieve the joint optimization of UAV trajectory and power allocation.Our simulations show the optimized UAV trajectory and verify the superiority of SSAC compared with various existing baseline schemes.The results of this study suggest that by maintaining appropriate distances from both the PCN and CNs,one can effectively enhance the performance of covert communication by reducing the detection probability of the CNs.展开更多
In the mid-to-late stages of gas reservoir development,liquid loading in gas wells becomes a common challenge.Plunger lift,as an intermittent production technique,is widely used for deliquification in gas wells.With t...In the mid-to-late stages of gas reservoir development,liquid loading in gas wells becomes a common challenge.Plunger lift,as an intermittent production technique,is widely used for deliquification in gas wells.With the advancement of big data and artificial intelligence,the future of oil and gas field development is trending towards intelligent,unmanned,and automated operations.Currently,the optimization of plunger lift working systems is primarily based on expert experience and manual control,focusing mainly on the success of the plunger lift without adequately considering the impact of different working systems on gas production.Additionally,liquid loading in gas wells is a dynamic process,and the intermittent nature of plunger lift requires accurate modeling;using constant inflow dynamics to describe reservoir flow introduces significant errors.To address these challenges,this study establishes a coupled wellbore-reservoir model for plunger lift wells and validates the computational wellhead pressure results against field measurements.Building on this model,a novel optimization control algorithm based on the deep deterministic policy gradient(DDPG)framework is proposed.The algorithm aims to optimize plunger lift working systems to balance overall reservoir pressure,stabilize gas-water ratios,and maximize gas production.Through simulation experiments in three different production optimization scenarios,the effectiveness of reinforcement learning algorithms(including RL,PPO,DQN,and the proposed DDPG)and traditional optimization algorithms(including GA,PSO,and Bayesian optimization)in enhancing production efficiency is compared.The results demonstrate that the coupled model provides highly accurate calculations and can precisely describe the transient production of wellbore and gas reservoir systems.The proposed DDPG algorithm achieves the highest reward value during training with minimal error,leading to a potential increase in cumulative gas production by up to 5%and cumulative liquid production by 252%.The DDPG algorithm exhibits robustness across different optimization scenarios,showcasing excellent adaptability and generalization capabilities.展开更多
The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in th...The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in this domain,the existing methods often lack strategic depth and generalization in complex,high-dimensional environments.To address these limitations,this paper proposes an optimized self-play method enhanced by advancements in fighter modeling,neural network design,and algorithmic frameworks.This study employs a six-degree-of-freedom(6-DOF)F-16 fighter model based on open-source aerodynamic data,featuring airborne equipment and a realistic visual simulation platform,unlike traditional 3-DOF models.To capture temporal dynamics,Long Short-Term Memory(LSTM)layers are integrated into the neural network,complemented by delayed input stacking.The RL environment incorporates expert strategies,curiositydriven rewards,and curriculum learning to improve adaptability and strategic decision-making.Experimental results demonstrate that the proposed approach achieves a winning rate exceeding90%against classical single-agent methods.Additionally,through enhanced 3D visual platforms,we conducted human-agent confrontation experiments,where the agent attained an average winning rate of over 75%.The agent's maneuver trajectories closely align with human pilot strategies,showcasing its potential in decision-making and pilot training applications.This study highlights the effectiveness of integrating advanced modeling and self-play techniques in developing robust air combat decision-making systems.展开更多
基金supported by the Fundamental Research Funds for the Central Universities(No.2023JBZX011)the Aeronautical Science Foundation of China(No.202300010M5001).
文摘1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused on the single-constraint issue,overlooking the more common multi-constraint setting which involves extensive computations and combinatorial optimization of multiple Lagrange multipliers.
基金supported by the National Natural Science Foundation of China(Nos.12272104,U22B2013).
文摘This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.
基金Supported by National Key Research and Development Program of China(Grant No.2022YFE0117100)National Science Foundation of China(Grant No.52102468,52325212)Fundamental Research Funds for the Central Universities。
文摘To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforcement learning with rule-based decision-making methods.A risk assessment model for lane-change maneuvers considering uncertain predictions of surrounding vehicles is established as a safety filter to improve learning efficiency while correcting dangerous actions for safety enhancement.On this basis,a Risk-fused DDQN is constructed utilizing the model-based risk assessment and supervision mechanism.The proposed reinforcement learning algorithm sets up a separate experience buffer for dangerous trials and punishes such actions,which is shown to improve the sampling efficiency and training outcomes.Compared with conventional DDQN methods,the proposed algorithm improves the convergence value of cumulated reward by 7.6%and 2.2%in the two constructed scenarios in the simulation study and reduces the number of training episodes by 52.2%and 66.8%respectively.The success rate of lane change is improved by 57.3%while the time headway is increased at least by 16.5%in real vehicle tests,which confirms the higher training efficiency,scenario adaptability,and security of the proposed Risk-fused DDQN.
基金co-supported by the National Natural Science Foundation of China(No.62103432)the China Postdoctoral Science Foundation(No.284881)the Young Talent fund of University Association for Science and Technology in Shaanxi,China(No.20210108)。
文摘Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value.
基金National Key Research and Development Program(2021YFB2900604)。
文摘Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link.
基金The National Natural Science Foundation of China(62136008,62293541)The Beijing Natural Science Foundation(4232056)The Beijing Nova Program(20240484514).
文摘Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.
文摘Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous control process of SMR can be divided into three stages,say,state diagnosis,autonomous decision-making and coordinated control.In this paper,the autonomous state recognition and task planning of unmanned SMR are investigated.An operating condition recognition method based on the knowledge base of SMR operation is proposed by using the artificial neural network(ANN)technology,which constructs a basis for the state judgment of intelligent reactor control path planning.An improved reinforcement learning path planning algorithm is utilized to implement the path transfer decision-makingThis algorithm performs condition transitions with minimal cost under specified modes.In summary,the full range control path intelligent decision-planning technology of SMR is realized,thus provides some theoretical basis for the design and build of unmanned SMR in the future.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R909),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The exponential growth of Internet ofThings(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet the stringent latency requirements of modern IoT systems,while pure edge computing faces resource constraints that limit processing capabilities.This paper addresses these challenges by proposing a novel Deep Reinforcement Learning(DRL)-enhanced priority-based scheduling framework for hybrid edge-cloud computing environments.Our approach integrates adaptive priority assignment with a two-level concurrency control protocol that ensures both optimal performance and data consistency.The framework introduces three key innovations:(1)a DRL-based dynamic priority assignmentmechanism that learns fromsystem behavior,(2)a hybrid concurrency control protocol combining local edge validation with global cloud coordination,and(3)an integrated mathematical model that formalizes sensor-driven transactions across edge-cloud architectures.Extensive simulations across diverse workload scenarios demonstrate significant quantitative improvements:40%latency reduction,25%throughput increase,85%resource utilization(compared to 60%for heuristicmethods),40%reduction in energy consumption(300 vs.500 J per task),and 50%improvement in scalability factor(1.8 vs.1.2 for EDF)compared to state-of-the-art heuristic and meta-heuristic approaches.These results establish the framework as a robust solution for large-scale IoT and autonomous applications requiring real-time processing with consistency guarantees.
基金supported in part by the Research Start-Up Funds of South-Central Minzu University under Grants YZZ23002,YZY23001,and YZZ18006in part by the Hubei Provincial Natural Science Foundation of China under Grants 2024AFB842 and 2023AFB202+3 种基金in part by the Knowledge Innovation Program of Wuhan Basic Research underGrant 2023010201010151in part by the Spring Sunshine Program of Ministry of Education of the People’s Republic of China under Grant HZKY20220331in part by the Funds for Academic Innovation Teams and Research Platformof South-CentralMinzu University Grant Number:XT224003,PTZ24001in part by the Career Development Fund(CDF)of the Agency for Science,Technology and Research(A*STAR)(Grant Number:C233312007).
文摘The knapsack problem is a classical combinatorial optimization problem widely encountered in areas such as logistics,resource allocation,and portfolio optimization.Traditional methods,including dynamic program-ming(DP)and greedy algorithms,have been effective in solving small problem instances but often struggle with scalability and efficiency as the problem size increases.DP,for instance,has exponential time complexity and can become computationally prohibitive for large problem instances.On the other hand,greedy algorithms offer faster solutions but may not always yield the optimal results,especially when the problem involves complex constraints or large numbers of items.This paper introduces a novel reinforcement learning(RL)approach to solve the knapsack problem by enhancing the state representation within the learning environment.We propose a representation where item weights and volumes are expressed as ratios relative to the knapsack’s capacity,and item values are normalized to represent their percentage of the total value across all items.This novel state modification leads to a 5%improvement in accuracy compared to the state-of-the-art RL-based algorithms,while significantly reducing execution time.Our RL-based method outperforms DP by over 9000 times in terms of speed,making it highly scalable for larger problem instances.Furthermore,we improve the performance of the RL model by incorporating Noisy layers into the neural network architecture.The addition of Noisy layers enhances the exploration capabilities of the agent,resulting in an additional accuracy boost of 0.2%–0.5%.The results demonstrate that our approach not only outperforms existing RL techniques,such as the Transformer model in terms of accuracy,but also provides a substantial improvement than DP in computational efficiency.This combination of enhanced accuracy and speed presents a promising solution for tackling large-scale optimization problems in real-world applications,where both precision and time are critical factors.
基金supported by National Natural Science Foundation of China under Grant No.62372110Fujian Provincial Natural Science of Foundation under Grants 2023J02008,2024H0009.
文摘The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments.
基金The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through the Large Group Project under grant number(RGP2/337/46)The research team thanks the Deanship of Graduate Studies and Scientific Research at Najran University for supporting the research project through the Nama’a program,with the project code NU/GP/SERC/13/352-4.
文摘Edge computing(EC)combined with the Internet of Things(IoT)provides a scalable and efficient solution for smart homes.Therapid proliferation of IoT devices poses real-time data processing and security challenges.EC has become a transformative paradigm for addressing these challenges,particularly in intrusion detection and anomaly mitigation.The widespread connectivity of IoT edge networks has exposed them to various security threats,necessitating robust strategies to detect malicious activities.This research presents a privacy-preserving federated anomaly detection framework combined with Bayesian game theory(BGT)and double deep Q-learning(DDQL).The proposed framework integrates BGT to model attacker and defender interactions for dynamic threat level adaptation and resource availability.It also models a strategic layout between attackers and defenders that takes into account uncertainty.DDQL is incorporated to optimize decision-making and aids in learning optimal defense policies at the edge,thereby ensuring policy and decision optimization.Federated learning(FL)enables decentralized and unshared anomaly detection for sensitive data between devices.Data collection has been performed from various sensors in a real-time EC-IoT network to identify irregularities that occurred due to different attacks.The results reveal that the proposed model achieves high detection accuracy of up to 98%while maintaining low resource consumption.This study demonstrates the synergy between game theory and FL to strengthen anomaly detection in EC-IoT networks.
基金supported by National Key Research and Development Program of China(2023YFF0906100)National Natural Science Foundation of China(52408008)Key Research and Development Program of Jiangsu Province(BE2022833).
文摘Current damage detection methods based on model updating and sensitivity Jacobian matrixes show a low convergence ratio and computational efficiency for online calculations.The aim of this paper is to construct a real-time automated damage detection method by developing a theory-assisted adaptive mutiagent twin delayed deep deterministic(TA2-MATD3)policy gradient algorithm.First,the theoretical framework of reinforcement-learning-driven damage detection is established.To address the disadvantages of traditional mutiagent twin delayed deep deterministic(MATD3)method,the theory-assisted mechanism and the adaptive experience playback mechanism are introduced.Moreover,a historical residential house built in 1889 was taken as an example,using its 12-month structural health monitoring data.TA2-MATD3 was compared with existing damage detection methods in terms of the convergence ratio,online computing efficiency,and damage detection accuracy.The results show that the computational efficiency of TA2-MATD3 is approximately 117–160 times that of the traditional methods.The convergence ratio of damage detection on the training set is approximately 97%,and that on the test set is in the range of 86.2%–91.9%.In addition,the main apparent damages found in the field survey were identified by TA2-MATD3.The results indicate that the proposed method can significantly improve the online computing efficiency and damage detection accuracy.This research can provide novel perspectives for the use of reinforcement learning methods to conduct damage detection in online structural health monitoring.
基金the support of the Chinese Special Research Project for Civil Aircraft(No.MJZ17N22)the National Natural Science Foundation of China(Nos.U2133207,U2333214)+1 种基金the China Postdoctoral Science Foundation(No.2023M741687)the National Social Science Fund of China(No.22&ZD169)。
文摘Unmanned Aerial Vehicle(UAV)stands as a burgeoning electric transportation carrier,holding substantial promise for the logistics sector.A reinforcement learning framework Centralized-S Proximal Policy Optimization(C-SPPO)based on centralized decision process and considering policy entropy(S)is proposed.The proposed framework aims to plan the best scheduling scheme with the objective of minimizing both the timeout of order requests and the flight impact of UAVs that may lead to conflicts.In this framework,the intents of matching act are generated through the observations of UAV agents,and the ultimate conflict-free matching results are output under the guidance of a centralized decision maker.Concurrently,a pre-activation operation is introduced to further enhance the cooperation among UAV agents.Simulation experiments based on real-world data from New York City are conducted.The results indicate that the proposed CSPPO outperforms the baseline algorithms in the Average Delay Time(ADT),the Maximum Delay Time(MDT),the Order Delay Rate(ODR),the Average Flight Distance(AFD),and the Flight Impact Ratio(FIR).Furthermore,the framework demonstrates scalability to scenarios of different sizes without requiring additional training.
基金funded by Hung Yen University of Technology and Education under grand number UTEHY.L.2025.62.
文摘Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments.
基金supported by the Innovation Program for Quantum Science and Technology of China(Grant No.2024ZD0300100)the National Basic Research Program of China(Grant No.2021YFA1400900)+1 种基金Shanghai Municipal Science and Technology(Grant Nos.25TQ003,2019SHZDZX01,and 24DP2600100)the National Natural Science Foundation of China(Grant No.12304555).
文摘We demonstrate a reinforcement learning(RL)-based control framework for optimizing evaporative cooling in the preparation of strongly interacting degenerate Fermi gases of 6Li.Using a Soft Actor-Critic(SAC)algorithm,the system autonomously explores a high-dimensional parameter space to learn optimal cooling trajectories.Compared to conventional exponential ramps,our method achieves up to 130%improvement in atomic density within 0.5 second,revealing non-trivial control strategies that balance fast evaporation and thermalization.While our current optimization focuses on the evaporation stage,future integration of other cooling stages,such as gray molasses cooling,could further extend RL to the full preparation pipeline.Our result highlights the promise of RL as a general tool for closed-loop quantum control and automated calibration in complex atomic physics experiments.
基金supported by Key Science and Technology Program of Henan Province,China(Grant Nos.242102210147,242102210027)Fujian Province Young and Middle aged Teacher Education Research Project(Science and Technology Category)(No.JZ240101)(Corresponding author:Dong Yuan).
文摘Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively.
基金co-supported by the National Natural Science Foundation of China(Nos.92371201 and 52192633)the Natural Science Foundation of Shaanxi Province of China(No.2022JC-03)the Aeronautical Science Foundation of China(No.ASFC-20220019070002)。
文摘In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies and trajectory planning and often perform poorly in complex environments.To improve the UAV-environment interaction efficiency,this study proposes a multi-UAV integrated navigation algorithm based on Deep Reinforcement Learning(DRL).This algorithm integrates the Inertial Navigation System(INS),Global Navigation Satellite System(GNSS),and Visual Navigation System(VNS)for comprehensive information fusion.Specifically,an improved multi-UAV integrated navigation algorithm called Information Fusion with MultiAgent Deep Deterministic Policy Gradient(IF-MADDPG)was developed.This algorithm enables UAVs to learn collaboratively and optimize their flight trajectories in real time.Through simulations and experiments,test scenarios in GNSS-denied environments were constructed to evaluate the effectiveness of the algorithm.The experimental results demonstrate that the IF-MADDPG algorithm significantly enhances the collaborative navigation capabilities of multiple UAVs in formation maintenance and GNSS-denied environments.Additionally,it has advantages in terms of mission completion time.This study provides a novel approach for efficient collaboration in multi-UAV systems,which significantly improves the robustness and adaptability of navigation systems.
基金This study was co-supported by the National Natural Science Foundation of China(No.62025110&62271093)the Natural Science Foundation of Chongqing,China(No.CSTB2023NSCQ-LZX0108).
文摘In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Communication Node(PCN).Specifically,all nodes transmit to the UAV exploiting uplink non-Orthogonal Multiple Access(NOMA),while the UAV performs covert transmission to the PCN at the same frequency.To minimize the average age of covert information,we formulate a joint optimization problem of UAV trajectory and power allocation designing subject to multi-dimensional constraints including covertness demand,communication quality requirement,maximum flying speed,and the maximum available resources.To address this problem,we embed Signomial Programming(SP)into Deep Reinforcement Learning(DRL)and propose a DRL framework capable of handling the constrained Markov decision processes,named SP embedded Soft Actor-Critic(SSAC).By adopting SSAC,we achieve the joint optimization of UAV trajectory and power allocation.Our simulations show the optimized UAV trajectory and verify the superiority of SSAC compared with various existing baseline schemes.The results of this study suggest that by maintaining appropriate distances from both the PCN and CNs,one can effectively enhance the performance of covert communication by reducing the detection probability of the CNs.
基金support from Science Foundation of China University of Petroleum,Beijing(No.2462023YJRC019)National Natural Science Foundation of China(No.52204059)Key Core Technology Research Project Foundation of PetroChina Group(No.2023ZG18).
文摘In the mid-to-late stages of gas reservoir development,liquid loading in gas wells becomes a common challenge.Plunger lift,as an intermittent production technique,is widely used for deliquification in gas wells.With the advancement of big data and artificial intelligence,the future of oil and gas field development is trending towards intelligent,unmanned,and automated operations.Currently,the optimization of plunger lift working systems is primarily based on expert experience and manual control,focusing mainly on the success of the plunger lift without adequately considering the impact of different working systems on gas production.Additionally,liquid loading in gas wells is a dynamic process,and the intermittent nature of plunger lift requires accurate modeling;using constant inflow dynamics to describe reservoir flow introduces significant errors.To address these challenges,this study establishes a coupled wellbore-reservoir model for plunger lift wells and validates the computational wellhead pressure results against field measurements.Building on this model,a novel optimization control algorithm based on the deep deterministic policy gradient(DDPG)framework is proposed.The algorithm aims to optimize plunger lift working systems to balance overall reservoir pressure,stabilize gas-water ratios,and maximize gas production.Through simulation experiments in three different production optimization scenarios,the effectiveness of reinforcement learning algorithms(including RL,PPO,DQN,and the proposed DDPG)and traditional optimization algorithms(including GA,PSO,and Bayesian optimization)in enhancing production efficiency is compared.The results demonstrate that the coupled model provides highly accurate calculations and can precisely describe the transient production of wellbore and gas reservoir systems.The proposed DDPG algorithm achieves the highest reward value during training with minimal error,leading to a potential increase in cumulative gas production by up to 5%and cumulative liquid production by 252%.The DDPG algorithm exhibits robustness across different optimization scenarios,showcasing excellent adaptability and generalization capabilities.
基金co-supported by the National Natural Science Foundation of China(No.91852115)。
文摘The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in this domain,the existing methods often lack strategic depth and generalization in complex,high-dimensional environments.To address these limitations,this paper proposes an optimized self-play method enhanced by advancements in fighter modeling,neural network design,and algorithmic frameworks.This study employs a six-degree-of-freedom(6-DOF)F-16 fighter model based on open-source aerodynamic data,featuring airborne equipment and a realistic visual simulation platform,unlike traditional 3-DOF models.To capture temporal dynamics,Long Short-Term Memory(LSTM)layers are integrated into the neural network,complemented by delayed input stacking.The RL environment incorporates expert strategies,curiositydriven rewards,and curriculum learning to improve adaptability and strategic decision-making.Experimental results demonstrate that the proposed approach achieves a winning rate exceeding90%against classical single-agent methods.Additionally,through enhanced 3D visual platforms,we conducted human-agent confrontation experiments,where the agent attained an average winning rate of over 75%.The agent's maneuver trajectories closely align with human pilot strategies,showcasing its potential in decision-making and pilot training applications.This study highlights the effectiveness of integrating advanced modeling and self-play techniques in developing robust air combat decision-making systems.