Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic top...Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments.展开更多
Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrain...Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively.展开更多
At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown ...At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems.展开更多
To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforce...To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforcement learning with rule-based decision-making methods.A risk assessment model for lane-change maneuvers considering uncertain predictions of surrounding vehicles is established as a safety filter to improve learning efficiency while correcting dangerous actions for safety enhancement.On this basis,a Risk-fused DDQN is constructed utilizing the model-based risk assessment and supervision mechanism.The proposed reinforcement learning algorithm sets up a separate experience buffer for dangerous trials and punishes such actions,which is shown to improve the sampling efficiency and training outcomes.Compared with conventional DDQN methods,the proposed algorithm improves the convergence value of cumulated reward by 7.6%and 2.2%in the two constructed scenarios in the simulation study and reduces the number of training episodes by 52.2%and 66.8%respectively.The success rate of lane change is improved by 57.3%while the time headway is increased at least by 16.5%in real vehicle tests,which confirms the higher training efficiency,scenario adaptability,and security of the proposed Risk-fused DDQN.展开更多
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-...Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.展开更多
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj...This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.展开更多
Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision u...Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value.展开更多
Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To sa...Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link.展开更多
Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous...Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous control process of SMR can be divided into three stages,say,state diagnosis,autonomous decision-making and coordinated control.In this paper,the autonomous state recognition and task planning of unmanned SMR are investigated.An operating condition recognition method based on the knowledge base of SMR operation is proposed by using the artificial neural network(ANN)technology,which constructs a basis for the state judgment of intelligent reactor control path planning.An improved reinforcement learning path planning algorithm is utilized to implement the path transfer decision-makingThis algorithm performs condition transitions with minimal cost under specified modes.In summary,the full range control path intelligent decision-planning technology of SMR is realized,thus provides some theoretical basis for the design and build of unmanned SMR in the future.展开更多
The integration of artificial intelligence into the development and production of mechatronic products offers a substantial opportunity to enhance efficiency, adaptability, and system performance. This paper examines ...The integration of artificial intelligence into the development and production of mechatronic products offers a substantial opportunity to enhance efficiency, adaptability, and system performance. This paper examines the utilization of reinforcement learning as a control strategy, with a particular focus on its deployment in pivotal stages of the product development lifecycle, specifically between system architecture and system integration and verification. A controller based on reinforcement learning was developed and evaluated in comparison to traditional proportional-integral controllers in dynamic and fault-prone environments. The results illustrate the superior adaptability, stability, and optimization potential of the reinforcement learning approach, particularly in addressing dynamic disturbances and ensuring robust performance. The study illustrates how reinforcement learning can facilitate the transition from conceptual design to implementation by automating optimization processes, enabling interface automation, and enhancing system-level testing. Based on the aforementioned findings, this paper presents future directions for research, which include the integration of domain-specific knowledge into the reinforcement learning process and the validation of this process in real-world environments. The results underscore the potential of artificial intelligence-driven methodologies to revolutionize the design and deployment of intelligent mechatronic systems.展开更多
Depression is a prevalent mental health disorder characterized by high relapse rates,highlighting the need for effective preventive interventions.This paper reviews the potential of reinforcement learning(RL)in preven...Depression is a prevalent mental health disorder characterized by high relapse rates,highlighting the need for effective preventive interventions.This paper reviews the potential of reinforcement learning(RL)in preventing depression relapse.RL,a subset of artificial intelligence,utilizes machine learning algorithms to analyze behavioral data,enabling early detection of relapse risk and optimization of personalized interventions.RL's ability to tailor treatment in real-time by adapting to individual needs and responses offers a dynamic alternative to traditional therapeutic approaches.Studies have demonstrated the efficacy of RL in customizing e-Health interventions and integrating mobile sensing with machine learning for adaptive mental health systems.Despite these advantages,challenges remain in algorithmic complexity,ethical considerations,and clinical implementation.Addressing these issues is crucial for the successful integration of RL into mental health care.This paper concludes with recommendations for future research directions,emphasizing the need for larger-scale studies and interdisciplinary collaboration to fully realize RL’s potential in improving mental health outcomes and preventing depression relapse.展开更多
The challenge of enhancing the generalization capacity of reinforcement learning(RL)agents remains a formidable obstacle.Existing RL methods,despite achieving superhuman performance on certain benchmarks,often struggl...The challenge of enhancing the generalization capacity of reinforcement learning(RL)agents remains a formidable obstacle.Existing RL methods,despite achieving superhuman performance on certain benchmarks,often struggle with this aspect.A potential reason is that the benchmarks used for training and evaluation may not adequately offer a diverse set of transferable tasks.Although recent studies have developed bench-marking environments to address this shortcoming,they typically fall short in providing tasks that both ensure a solid foundation for generalization and exhibit significant variability.To overcome these limitations,this work introduces the concept that‘objects are composed of more fundamental components’in environment design,as implemented in the proposed environment called summon the magic(StM).This environment generates tasks where objects are derived from extensible and shareable basic components,facilitating strategy reuse and enhancing generalization.Furthermore,two new metrics,adaptation sensitivity range(ASR)and parameter correlation coefficient(PCC),are proposed to better capture and evaluate the generalization process of RL agents.Experimental results show that increasing the number of basic components of the object reduces the proximal policy optimization(PPO)agent’s training-testing gap by 60.9%(in episode reward),significantly alleviating overfitting.Additionally,linear variations in other environmental factors,such as the training monster set proportion and the total number of basic components,uniformly decrease the gap by at least 32.1%.These results highlight StM’s effectiveness in benchmarking and probing the generalization capabilities of RL algorithms.展开更多
In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies an...In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies and trajectory planning and often perform poorly in complex environments.To improve the UAV-environment interaction efficiency,this study proposes a multi-UAV integrated navigation algorithm based on Deep Reinforcement Learning(DRL).This algorithm integrates the Inertial Navigation System(INS),Global Navigation Satellite System(GNSS),and Visual Navigation System(VNS)for comprehensive information fusion.Specifically,an improved multi-UAV integrated navigation algorithm called Information Fusion with MultiAgent Deep Deterministic Policy Gradient(IF-MADDPG)was developed.This algorithm enables UAVs to learn collaboratively and optimize their flight trajectories in real time.Through simulations and experiments,test scenarios in GNSS-denied environments were constructed to evaluate the effectiveness of the algorithm.The experimental results demonstrate that the IF-MADDPG algorithm significantly enhances the collaborative navigation capabilities of multiple UAVs in formation maintenance and GNSS-denied environments.Additionally,it has advantages in terms of mission completion time.This study provides a novel approach for efficient collaboration in multi-UAV systems,which significantly improves the robustness and adaptability of navigation systems.展开更多
Blockchain technology,based on decentralized data storage and distributed consensus design,has become a promising solution to address data security risks and provide privacy protection in the Internet-of-Things(IoT)du...Blockchain technology,based on decentralized data storage and distributed consensus design,has become a promising solution to address data security risks and provide privacy protection in the Internet-of-Things(IoT)due to its tamper-proof and non-repudiation features.Although blockchain typically does not require the endorsement of third-party trust organizations,it mostly needs to perform necessary mathematical calculations to prevent malicious attacks,which results in stricter requirements for computation resources on the participating devices.By offloading the computation tasks required to support blockchain consensus to edge service nodes or the cloud,while providing data privacy protection for IoT applications,it can effectively address the limitations of computation and energy resources in IoT devices.However,how to make reasonable offloading decisions for IoT devices remains an open issue.Due to the excellent self-learning ability of Reinforcement Learning(RL),this paper proposes a RL enabled Swarm Intelligence Optimization Algorithm(RLSIOA)that aims to improve the quality of initial solutions and achieve efficient optimization of computation task offloading decisions.The algorithm considers various factors that may affect the revenue obtained by IoT devices executing consensus algorithms(e.g.,Proof-of-Work),it optimizes the proportion of sub-tasks to be offloaded and the scale of computing resources to be rented from the edge and cloud to maximize the revenue of devices.Experimental results show that RLSIOA can obtain higher-quality offloading decision-making schemes at lower latency costs compared to representative benchmark algorithms.展开更多
Efficient edge caching is essential for maximizing utility in video streaming systems,especially under constraints such as limited storage capacity and dynamically fluctuating content popularity.Utility,defined as the...Efficient edge caching is essential for maximizing utility in video streaming systems,especially under constraints such as limited storage capacity and dynamically fluctuating content popularity.Utility,defined as the benefit obtained per unit of cache bandwidth usage,degrades when static or greedy caching strategies fail to adapt to changing demand patterns.To address this,we propose a deep reinforcement learning(DRL)-based caching framework built upon the proximal policy optimization(PPO)algorithm.Our approach formulates edge caching as a sequential decision-making problem and introduces a reward model that balances cache hit performance and utility by prioritizing high-demand,high-quality content while penalizing degraded quality delivery.We construct a realistic synthetic dataset that captures both temporal variations and shifting content popularity to validate our model.Experimental results demonstrate that our proposed method improves utility by up to 135.9%and achieves an average improvement of 22.6%compared to traditional greedy algorithms and long short-term memory(LSTM)-based prediction models.Moreover,our method consistently performs well across a variety of utility functions,workload distributions,and storage limitations,underscoring its adaptability and robustness in dynamic video caching environments.展开更多
Vehicular Edge Computing(VEC)enhances the quality of user services by deploying wealth of resources near vehicles.However,due to highly dynamic and complex nature of vehicular networks,centralized decisionmaking for r...Vehicular Edge Computing(VEC)enhances the quality of user services by deploying wealth of resources near vehicles.However,due to highly dynamic and complex nature of vehicular networks,centralized decisionmaking for resource allocation proves inadequate within VECs.Conversely,allocating resources via distributed decision-making consumes vehicular resources.To improve the quality of user service,we formulate a problem of latency minimization,further subdividing this problem into two subproblems to be solved through distributed decision-making.To mitigate the resource consumption caused by distributed decision-making,we propose Reinforcement Learning(RL)algorithm based on sequential alternating multi-agent system mechanism,which effectively reduces the dimensionality of action space without losing the informational content of action,achieving network lightweighting.We discuss the rationality,generalizability,and inherent advantages of proposed mechanism.Simulation results indicate that our proposed mechanism outperforms traditional RL algorithms in terms of stability,generalizability,and adaptability to scenarios with invalid actions,all while achieving network lightweighting.展开更多
The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in th...The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in this domain,the existing methods often lack strategic depth and generalization in complex,high-dimensional environments.To address these limitations,this paper proposes an optimized self-play method enhanced by advancements in fighter modeling,neural network design,and algorithmic frameworks.This study employs a six-degree-of-freedom(6-DOF)F-16 fighter model based on open-source aerodynamic data,featuring airborne equipment and a realistic visual simulation platform,unlike traditional 3-DOF models.To capture temporal dynamics,Long Short-Term Memory(LSTM)layers are integrated into the neural network,complemented by delayed input stacking.The RL environment incorporates expert strategies,curiositydriven rewards,and curriculum learning to improve adaptability and strategic decision-making.Experimental results demonstrate that the proposed approach achieves a winning rate exceeding90%against classical single-agent methods.Additionally,through enhanced 3D visual platforms,we conducted human-agent confrontation experiments,where the agent attained an average winning rate of over 75%.The agent's maneuver trajectories closely align with human pilot strategies,showcasing its potential in decision-making and pilot training applications.This study highlights the effectiveness of integrating advanced modeling and self-play techniques in developing robust air combat decision-making systems.展开更多
Smart learning environments have been considered as vital sources and essential needs in modern digital education systems.With the rapid proliferation of smart and assistive technologies,smart learning processes have ...Smart learning environments have been considered as vital sources and essential needs in modern digital education systems.With the rapid proliferation of smart and assistive technologies,smart learning processes have become quite convenient,comfortable,and financially affordable.This shift has led to the emergence of pervasive computing environments,where user’s intelligent behavior is supported by smart gadgets;however,it is becoming more challenging due to inconsistent behavior of Artificial intelligence(AI)assistive technologies in terms of networking issues,slow user responses to technologies and limited computational resources.This paper presents a context-aware predictive reasoning based formalism for smart learning environments that facilitates students in managing their academic as well as extra-curricular activities autonomously with limited human intervention.This system consists of a three-tier architecture including the acquisition of the contextualized information from the environment autonomously,modeling the system using Web Ontology Rule Language(OWL 2 RL)and Semantic Web Rule Language(SWRL),and perform reasoning to infer the desired goals whenever and wherever needed.For contextual reasoning,we develop a non-monotonic reasoning based formalism to reason with contextual information using rule-based reasoning.The focus is on distributed problem solving,where context-aware agents exchange information using rule-based reasoning and specify constraints to accomplish desired goals.To formally model-check and simulate the system behavior,we model the case study of a smart learning environment in the UPPAAL model checker and verify the desired properties in the model,such as safety,liveness and robust properties to reflect the overall correctness behavior of the system with achieving the minimum analysis time of 0.002 s and 34,712 KB memory utilization.展开更多
Urban expansion has far-reaching implications for economy,environment,and socio-cultural aspects of a city.Therefore,it is essential to have a thorough understanding of the complex dynamics and driving factors behind ...Urban expansion has far-reaching implications for economy,environment,and socio-cultural aspects of a city.Therefore,it is essential to have a thorough understanding of the complex dynamics and driving factors behind urban expansion in order to make informed decisions that promote the long-term sustainability of a city.Currently,cellular automata(CA)and agent-based modeling(ABM)have been widely employed to simulate urban land growth.However,existing research lacks a comprehensive consideration of the influence of individual agent attributes and land population capacity on site selection decisions.Consequently,we propose a novel approach that incorporates fine-scale population data into the site-selection decision simulation process,allowing for a granular depiction of individual decision attributes.Moreover,the site selection process integrates assessment criteria,including population capacity and neighborhood development status.Furthermore,to address the issue of fragmented simulated residential land use outcomes,population redistribution is iteratively conducted.Additionally,by integrating extended reinforcement learning mechanisms,the site selection process of residential multi-agent systems experiences a significant improvement in overall simulation accuracy.The proposed model was applied to simulate urban expansion in Shenzhen,Guangdong province,China.The results demonstrated that this model effectively enhances the behavioral decision-making capabilities of intelligent agents,thereby providing insights into the mechanisms underlying urban expansion.These findings hold considerable significance for making informed urban planning decisions and advancing the goal of sustainable urban development.展开更多
In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agen...In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agents lead to complex collective behaviors.This paper studies Multi-Agent Reinforcement Learning to simulate fish schooling behavior,overcoming the challenges of tuning parameters in traditional models and addressing the limitations of single-agent methods in multi-agent environments.Based on this foundation,a novel Graph Convolutional Networks(GCN)-Critic MADDPG algorithm leveraging GCN is proposed to enhance cooperation among agents in a multi-agent system.Simulation experiments demonstrate that,compared to traditional single-agent algorithms,the proposed method not only exhibits significant advantages in terms of convergence speed and stability but also achieves tighter group formations and more naturally aligned Milling behavior.Additionally,a fish school self-organizing behavior research platform based on an event-triggered mechanism has been developed,providing a robust tool for exploring dynamic behavioral changes under various conditions.展开更多
基金funded by Hung Yen University of Technology and Education under grand number UTEHY.L.2025.62.
文摘Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments.
基金supported by Key Science and Technology Program of Henan Province,China(Grant Nos.242102210147,242102210027)Fujian Province Young and Middle aged Teacher Education Research Project(Science and Technology Category)(No.JZ240101)(Corresponding author:Dong Yuan).
文摘Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively.
文摘At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems.
基金Supported by National Key Research and Development Program of China(Grant No.2022YFE0117100)National Science Foundation of China(Grant No.52102468,52325212)Fundamental Research Funds for the Central Universities。
文摘To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforcement learning with rule-based decision-making methods.A risk assessment model for lane-change maneuvers considering uncertain predictions of surrounding vehicles is established as a safety filter to improve learning efficiency while correcting dangerous actions for safety enhancement.On this basis,a Risk-fused DDQN is constructed utilizing the model-based risk assessment and supervision mechanism.The proposed reinforcement learning algorithm sets up a separate experience buffer for dangerous trials and punishes such actions,which is shown to improve the sampling efficiency and training outcomes.Compared with conventional DDQN methods,the proposed algorithm improves the convergence value of cumulated reward by 7.6%and 2.2%in the two constructed scenarios in the simulation study and reduces the number of training episodes by 52.2%and 66.8%respectively.The success rate of lane change is improved by 57.3%while the time headway is increased at least by 16.5%in real vehicle tests,which confirms the higher training efficiency,scenario adaptability,and security of the proposed Risk-fused DDQN.
基金The National Natural Science Foundation of China(62136008,62293541)The Beijing Natural Science Foundation(4232056)The Beijing Nova Program(20240484514).
文摘Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.
基金supported by the National Natural Science Foundation of China(Nos.12272104,U22B2013).
文摘This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.
基金co-supported by the National Natural Science Foundation of China(No.62103432)the China Postdoctoral Science Foundation(No.284881)the Young Talent fund of University Association for Science and Technology in Shaanxi,China(No.20210108)。
文摘Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value.
基金National Key Research and Development Program(2021YFB2900604)。
文摘Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link.
文摘Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous control process of SMR can be divided into three stages,say,state diagnosis,autonomous decision-making and coordinated control.In this paper,the autonomous state recognition and task planning of unmanned SMR are investigated.An operating condition recognition method based on the knowledge base of SMR operation is proposed by using the artificial neural network(ANN)technology,which constructs a basis for the state judgment of intelligent reactor control path planning.An improved reinforcement learning path planning algorithm is utilized to implement the path transfer decision-makingThis algorithm performs condition transitions with minimal cost under specified modes.In summary,the full range control path intelligent decision-planning technology of SMR is realized,thus provides some theoretical basis for the design and build of unmanned SMR in the future.
文摘The integration of artificial intelligence into the development and production of mechatronic products offers a substantial opportunity to enhance efficiency, adaptability, and system performance. This paper examines the utilization of reinforcement learning as a control strategy, with a particular focus on its deployment in pivotal stages of the product development lifecycle, specifically between system architecture and system integration and verification. A controller based on reinforcement learning was developed and evaluated in comparison to traditional proportional-integral controllers in dynamic and fault-prone environments. The results illustrate the superior adaptability, stability, and optimization potential of the reinforcement learning approach, particularly in addressing dynamic disturbances and ensuring robust performance. The study illustrates how reinforcement learning can facilitate the transition from conceptual design to implementation by automating optimization processes, enabling interface automation, and enhancing system-level testing. Based on the aforementioned findings, this paper presents future directions for research, which include the integration of domain-specific knowledge into the reinforcement learning process and the validation of this process in real-world environments. The results underscore the potential of artificial intelligence-driven methodologies to revolutionize the design and deployment of intelligent mechatronic systems.
文摘Depression is a prevalent mental health disorder characterized by high relapse rates,highlighting the need for effective preventive interventions.This paper reviews the potential of reinforcement learning(RL)in preventing depression relapse.RL,a subset of artificial intelligence,utilizes machine learning algorithms to analyze behavioral data,enabling early detection of relapse risk and optimization of personalized interventions.RL's ability to tailor treatment in real-time by adapting to individual needs and responses offers a dynamic alternative to traditional therapeutic approaches.Studies have demonstrated the efficacy of RL in customizing e-Health interventions and integrating mobile sensing with machine learning for adaptive mental health systems.Despite these advantages,challenges remain in algorithmic complexity,ethical considerations,and clinical implementation.Addressing these issues is crucial for the successful integration of RL into mental health care.This paper concludes with recommendations for future research directions,emphasizing the need for larger-scale studies and interdisciplinary collaboration to fully realize RL’s potential in improving mental health outcomes and preventing depression relapse.
基金Supported by the National Key R&D Program of China(No.2023YFB4502200)the National Natural Science Foundation of China(No.U22A2028,61925208,62222214,62341411,62102398,62102399,U20A20227,62302478,62302482,62302483,62302480,62302481)+2 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDB0660300,XDB0660301,XDB0660302)the Chinese Academy of Sciences Project for Young Scientists in Basic Research(No.YSBR-029)the Youth Innovation Promotion Association of Chinese Academy of Sciences and Xplore Prize.
文摘The challenge of enhancing the generalization capacity of reinforcement learning(RL)agents remains a formidable obstacle.Existing RL methods,despite achieving superhuman performance on certain benchmarks,often struggle with this aspect.A potential reason is that the benchmarks used for training and evaluation may not adequately offer a diverse set of transferable tasks.Although recent studies have developed bench-marking environments to address this shortcoming,they typically fall short in providing tasks that both ensure a solid foundation for generalization and exhibit significant variability.To overcome these limitations,this work introduces the concept that‘objects are composed of more fundamental components’in environment design,as implemented in the proposed environment called summon the magic(StM).This environment generates tasks where objects are derived from extensible and shareable basic components,facilitating strategy reuse and enhancing generalization.Furthermore,two new metrics,adaptation sensitivity range(ASR)and parameter correlation coefficient(PCC),are proposed to better capture and evaluate the generalization process of RL agents.Experimental results show that increasing the number of basic components of the object reduces the proximal policy optimization(PPO)agent’s training-testing gap by 60.9%(in episode reward),significantly alleviating overfitting.Additionally,linear variations in other environmental factors,such as the training monster set proportion and the total number of basic components,uniformly decrease the gap by at least 32.1%.These results highlight StM’s effectiveness in benchmarking and probing the generalization capabilities of RL algorithms.
基金co-supported by the National Natural Science Foundation of China(Nos.92371201 and 52192633)the Natural Science Foundation of Shaanxi Province of China(No.2022JC-03)the Aeronautical Science Foundation of China(No.ASFC-20220019070002)。
文摘In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies and trajectory planning and often perform poorly in complex environments.To improve the UAV-environment interaction efficiency,this study proposes a multi-UAV integrated navigation algorithm based on Deep Reinforcement Learning(DRL).This algorithm integrates the Inertial Navigation System(INS),Global Navigation Satellite System(GNSS),and Visual Navigation System(VNS)for comprehensive information fusion.Specifically,an improved multi-UAV integrated navigation algorithm called Information Fusion with MultiAgent Deep Deterministic Policy Gradient(IF-MADDPG)was developed.This algorithm enables UAVs to learn collaboratively and optimize their flight trajectories in real time.Through simulations and experiments,test scenarios in GNSS-denied environments were constructed to evaluate the effectiveness of the algorithm.The experimental results demonstrate that the IF-MADDPG algorithm significantly enhances the collaborative navigation capabilities of multiple UAVs in formation maintenance and GNSS-denied environments.Additionally,it has advantages in terms of mission completion time.This study provides a novel approach for efficient collaboration in multi-UAV systems,which significantly improves the robustness and adaptability of navigation systems.
基金supported by the Project of Science and Technology Research Program of Chongqing Education Commission of China(No.KJZD-K202401105)High-Quality Development Action Plan for Graduate Education at Chongqing University of Technology(No.gzljg2023308,No.gzljd2024204)+1 种基金the Graduate Innovation Program of Chongqing University of Technology(No.gzlcx20233197)Yunnan Provincial Key R&D Program(202203AA080006).
文摘Blockchain technology,based on decentralized data storage and distributed consensus design,has become a promising solution to address data security risks and provide privacy protection in the Internet-of-Things(IoT)due to its tamper-proof and non-repudiation features.Although blockchain typically does not require the endorsement of third-party trust organizations,it mostly needs to perform necessary mathematical calculations to prevent malicious attacks,which results in stricter requirements for computation resources on the participating devices.By offloading the computation tasks required to support blockchain consensus to edge service nodes or the cloud,while providing data privacy protection for IoT applications,it can effectively address the limitations of computation and energy resources in IoT devices.However,how to make reasonable offloading decisions for IoT devices remains an open issue.Due to the excellent self-learning ability of Reinforcement Learning(RL),this paper proposes a RL enabled Swarm Intelligence Optimization Algorithm(RLSIOA)that aims to improve the quality of initial solutions and achieve efficient optimization of computation task offloading decisions.The algorithm considers various factors that may affect the revenue obtained by IoT devices executing consensus algorithms(e.g.,Proof-of-Work),it optimizes the proportion of sub-tasks to be offloaded and the scale of computing resources to be rented from the edge and cloud to maximize the revenue of devices.Experimental results show that RLSIOA can obtain higher-quality offloading decision-making schemes at lower latency costs compared to representative benchmark algorithms.
文摘Efficient edge caching is essential for maximizing utility in video streaming systems,especially under constraints such as limited storage capacity and dynamically fluctuating content popularity.Utility,defined as the benefit obtained per unit of cache bandwidth usage,degrades when static or greedy caching strategies fail to adapt to changing demand patterns.To address this,we propose a deep reinforcement learning(DRL)-based caching framework built upon the proximal policy optimization(PPO)algorithm.Our approach formulates edge caching as a sequential decision-making problem and introduces a reward model that balances cache hit performance and utility by prioritizing high-demand,high-quality content while penalizing degraded quality delivery.We construct a realistic synthetic dataset that captures both temporal variations and shifting content popularity to validate our model.Experimental results demonstrate that our proposed method improves utility by up to 135.9%and achieves an average improvement of 22.6%compared to traditional greedy algorithms and long short-term memory(LSTM)-based prediction models.Moreover,our method consistently performs well across a variety of utility functions,workload distributions,and storage limitations,underscoring its adaptability and robustness in dynamic video caching environments.
基金supported by the National Natural Science Foundation of China(62271096,U20A20157)Science and Technology Research Program of Chongqing Municipal Education Commission(KJQN202000626)+4 种基金Natural Science Foundation of Chongqing,China(cstc2020jcyjzdxmX0024)University Innovation Research Group of Chongqing(CXQT20017)Youth Innovation Group Support Program of ICE Discipline of CQUPT(SCIE-QN-2022-04)Chongqing Postdoctoral Science Special Foundation(2021XM3058)Chongqing Postgraduate Research and Innovation Project under grant(CYB22250).
文摘Vehicular Edge Computing(VEC)enhances the quality of user services by deploying wealth of resources near vehicles.However,due to highly dynamic and complex nature of vehicular networks,centralized decisionmaking for resource allocation proves inadequate within VECs.Conversely,allocating resources via distributed decision-making consumes vehicular resources.To improve the quality of user service,we formulate a problem of latency minimization,further subdividing this problem into two subproblems to be solved through distributed decision-making.To mitigate the resource consumption caused by distributed decision-making,we propose Reinforcement Learning(RL)algorithm based on sequential alternating multi-agent system mechanism,which effectively reduces the dimensionality of action space without losing the informational content of action,achieving network lightweighting.We discuss the rationality,generalizability,and inherent advantages of proposed mechanism.Simulation results indicate that our proposed mechanism outperforms traditional RL algorithms in terms of stability,generalizability,and adaptability to scenarios with invalid actions,all while achieving network lightweighting.
基金co-supported by the National Natural Science Foundation of China(No.91852115)。
文摘The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in this domain,the existing methods often lack strategic depth and generalization in complex,high-dimensional environments.To address these limitations,this paper proposes an optimized self-play method enhanced by advancements in fighter modeling,neural network design,and algorithmic frameworks.This study employs a six-degree-of-freedom(6-DOF)F-16 fighter model based on open-source aerodynamic data,featuring airborne equipment and a realistic visual simulation platform,unlike traditional 3-DOF models.To capture temporal dynamics,Long Short-Term Memory(LSTM)layers are integrated into the neural network,complemented by delayed input stacking.The RL environment incorporates expert strategies,curiositydriven rewards,and curriculum learning to improve adaptability and strategic decision-making.Experimental results demonstrate that the proposed approach achieves a winning rate exceeding90%against classical single-agent methods.Additionally,through enhanced 3D visual platforms,we conducted human-agent confrontation experiments,where the agent attained an average winning rate of over 75%.The agent's maneuver trajectories closely align with human pilot strategies,showcasing its potential in decision-making and pilot training applications.This study highlights the effectiveness of integrating advanced modeling and self-play techniques in developing robust air combat decision-making systems.
基金supported by the National Research Foundation(NRF),Republic of Korea,under project BK21 FOUR(4299990213939).
文摘Smart learning environments have been considered as vital sources and essential needs in modern digital education systems.With the rapid proliferation of smart and assistive technologies,smart learning processes have become quite convenient,comfortable,and financially affordable.This shift has led to the emergence of pervasive computing environments,where user’s intelligent behavior is supported by smart gadgets;however,it is becoming more challenging due to inconsistent behavior of Artificial intelligence(AI)assistive technologies in terms of networking issues,slow user responses to technologies and limited computational resources.This paper presents a context-aware predictive reasoning based formalism for smart learning environments that facilitates students in managing their academic as well as extra-curricular activities autonomously with limited human intervention.This system consists of a three-tier architecture including the acquisition of the contextualized information from the environment autonomously,modeling the system using Web Ontology Rule Language(OWL 2 RL)and Semantic Web Rule Language(SWRL),and perform reasoning to infer the desired goals whenever and wherever needed.For contextual reasoning,we develop a non-monotonic reasoning based formalism to reason with contextual information using rule-based reasoning.The focus is on distributed problem solving,where context-aware agents exchange information using rule-based reasoning and specify constraints to accomplish desired goals.To formally model-check and simulate the system behavior,we model the case study of a smart learning environment in the UPPAAL model checker and verify the desired properties in the model,such as safety,liveness and robust properties to reflect the overall correctness behavior of the system with achieving the minimum analysis time of 0.002 s and 34,712 KB memory utilization.
文摘Urban expansion has far-reaching implications for economy,environment,and socio-cultural aspects of a city.Therefore,it is essential to have a thorough understanding of the complex dynamics and driving factors behind urban expansion in order to make informed decisions that promote the long-term sustainability of a city.Currently,cellular automata(CA)and agent-based modeling(ABM)have been widely employed to simulate urban land growth.However,existing research lacks a comprehensive consideration of the influence of individual agent attributes and land population capacity on site selection decisions.Consequently,we propose a novel approach that incorporates fine-scale population data into the site-selection decision simulation process,allowing for a granular depiction of individual decision attributes.Moreover,the site selection process integrates assessment criteria,including population capacity and neighborhood development status.Furthermore,to address the issue of fragmented simulated residential land use outcomes,population redistribution is iteratively conducted.Additionally,by integrating extended reinforcement learning mechanisms,the site selection process of residential multi-agent systems experiences a significant improvement in overall simulation accuracy.The proposed model was applied to simulate urban expansion in Shenzhen,Guangdong province,China.The results demonstrated that this model effectively enhances the behavioral decision-making capabilities of intelligent agents,thereby providing insights into the mechanisms underlying urban expansion.These findings hold considerable significance for making informed urban planning decisions and advancing the goal of sustainable urban development.
基金supported by the National Natural Science Foundation of China under Grant 62273351 and Grant 62303020.
文摘In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agents lead to complex collective behaviors.This paper studies Multi-Agent Reinforcement Learning to simulate fish schooling behavior,overcoming the challenges of tuning parameters in traditional models and addressing the limitations of single-agent methods in multi-agent environments.Based on this foundation,a novel Graph Convolutional Networks(GCN)-Critic MADDPG algorithm leveraging GCN is proposed to enhance cooperation among agents in a multi-agent system.Simulation experiments demonstrate that,compared to traditional single-agent algorithms,the proposed method not only exhibits significant advantages in terms of convergence speed and stability but also achieves tighter group formations and more naturally aligned Milling behavior.Additionally,a fish school self-organizing behavior research platform based on an event-triggered mechanism has been developed,providing a robust tool for exploring dynamic behavioral changes under various conditions.