With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier...With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.展开更多
In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learni...In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learning algorithms suffer the slow convergence rate because of the enormous learning space produced by joint-action. In this article, a prediction-based reinforcement learning algorithm is presented for multi-agent cooperation tasks, which demands all agents to learn predicting the probabilities of actions that other agents may execute. A multi-robot cooperation experiment is run to test the efficacy of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation policy much faster than the primitive reinforcement learning algorithm.展开更多
Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To sa...Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link.展开更多
In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading...In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess.展开更多
Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on...Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on multi-agent inverted pendulum is made to test the efficency of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation strategy much faster than the primitive multiagent reinforcement learning algorithm.展开更多
Low earth orbit (LEO) satellite networkscan provide wider service coverage and lower latencythan traditional terrestrial networks, which haveattracted considerable attention. However, the unevendistribution of human p...Low earth orbit (LEO) satellite networkscan provide wider service coverage and lower latencythan traditional terrestrial networks, which haveattracted considerable attention. However, the unevendistribution of human population and data trafficon the ground incurs unbalanced traffic load inLEO satellite networks. To this end, we proposea load-balancing routing algorithm for LEO satellitenetworks based on ant colony optimization and reinforcementlearning. In the ant colony algorithm,we improve the pheromone update rule by introducingload-aware heuristic information, e.g., the currentnode transmission overhead, delay and load status, andreinforcement learning-based link quality evaluation.It enables the routing algorithm to select the lightlyloaded node as the next hop to balance the networkload. We simulate and verify the proposed algorithmusing the NS2 simulation platform, and the resultsshow that our algorithm improves the data delivery ratioand throughput while ensuring lower latency andtransmission overhead.展开更多
In multi-agent confrontation scenarios, a jammer is constrained by the single limited performance and inefficiency of practical application. To cope with these issues, this paper aims to investigate the multi-agent ja...In multi-agent confrontation scenarios, a jammer is constrained by the single limited performance and inefficiency of practical application. To cope with these issues, this paper aims to investigate the multi-agent jamming problem in a multi-user scenario, where the coordination between the jammers is considered. Firstly, a multi-agent Markov decision process (MDP) framework is used to model and analyze the multi-agent jamming problem. Secondly, a collaborative multi-agent jamming algorithm (CMJA) based on reinforcement learning is proposed. Finally, an actual intelligent jamming system is designed and built based on software-defined radio (SDR) platform for simulation and platform verification. The simulation and platform verification results show that the proposed CMJA algorithm outperforms the independent Q-learning method and provides a better jamming effect.展开更多
Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance ...Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance in cooperative decision-making,it is challenging for existing MARL algorithms to quickly converge to an optimal strategy for UCAV formation in BVR air combat where confrontation is complicated and reward is extremely sparse and delayed.Aiming to solve this problem,this paper proposes an Advantage Highlight Multi-Agent Proximal Policy Optimization(AHMAPPO)algorithm.First,at every step,the AHMAPPO records the degree to which the best formation exceeds the average of formations in parallel environments and carries out additional advantage sampling according to it.Then,the sampling result is introduced into the updating process of the actor network to improve its optimization efficiency.Finally,the simulation results reveal that compared with some state-of-the-art MARL algorithms,the AHMAPPO can obtain a more excellent strategy utilizing fewer sample episodes in the UCAV formation BVR air combat simulation environment built in this paper,which can reflect the critical features of BVR air combat.The AHMAPPO can significantly increase the convergence efficiency of the strategy for UCAV formation in BVR air combat,with a maximum increase of 81.5%relative to other algorithms.展开更多
Distributed wireless sensor networks have been shown to be effective for environmental monitoring tasks,in which multiple sensors are deployed in a wide range of the environments to collect information or monitor a pa...Distributed wireless sensor networks have been shown to be effective for environmental monitoring tasks,in which multiple sensors are deployed in a wide range of the environments to collect information or monitor a particular event,Wireless sensor networks,consisting of a large number of interacting sensors,have been successful in a variety of applications where they are able to share information using different transmission protocols through the communication network.However,the irregular and dynamic environment requires traditional wireless sensor networks to have frequent communications to exchange the most recent information,which can easily generate high communication cost through the collaborative data collection and data transmission.High frequency communication also has high probability of failure because of long distance data transmission.In this paper,we developed a novel approach to multi-sensor environment monitoring network using the idea of distributed system.Its communication network can overcome the difficulties of high communication cost and Single Point of Failure(SPOF)through the decentralized approach,which performs in-network computation.Our approach makes use of Boolean networks that allows for a non-complex method of corroboration and retains meaningful information regarding the dynamics of the communication network.Our approach also reduces the complexity of data aggregation process and employee a reinforcement learning algorithm to predict future event inside the environment through the pattern recognition.展开更多
Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(S...Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(SFC)under 5G networks,this paper proposes a multi-agent deep deterministic policy gradient optimization algorithm for SFC deployment(MADDPG-SD).Initially,an optimization model is devised to enhance the request acceptance rate,minimizing the latency and deploying the cost SFC is constructed for the network resource-constrained case.Subsequently,we model the dynamic problem as a Markov decision process(MDP),facilitating adaptation to the evolving states of network resources.Finally,by allocating SFCs to different agents and adopting a collaborative deployment strategy,each agent aims to maximize the request acceptance rate or minimize latency and costs.These agents learn strategies from historical data of virtual network functions in SFCs to guide server node selection,and achieve approximately optimal SFC deployment strategies through a cooperative framework of centralized training and distributed execution.Experimental simulation results indicate that the proposed method,while simultaneously meeting performance requirements and resource capacity constraints,has effectively increased the acceptance rate of requests compared to the comparative algorithms,reducing the end-to-end latency by 4.942%and the deployment cost by 8.045%.展开更多
Multi-Agent Reinforcement Learning(MARL)has proven to be successful in cooperative assignments.MARL is used to investigate how autonomous agents with the same interests can connect and act in one team.MARL cooperation...Multi-Agent Reinforcement Learning(MARL)has proven to be successful in cooperative assignments.MARL is used to investigate how autonomous agents with the same interests can connect and act in one team.MARL cooperation scenarios are explored in recreational cooperative augmented reality environments,as well as realworld scenarios in robotics.In this paper,we explore the realm of MARL and its potential applications in cooperative assignments.Our focus is on developing a multi-agent system that can collaborate to attack or defend against enemies and achieve victory withminimal damage.To accomplish this,we utilize the StarCraftMulti-Agent Challenge(SMAC)environment and train four MARL algorithms:Q-learning with Mixtures of Experts(QMIX),Value-DecompositionNetwork(VDN),Multi-agent Proximal PolicyOptimizer(MAPPO),andMulti-Agent Actor Attention Critic(MAA2C).These algorithms allow multiple agents to cooperate in a specific scenario to achieve the targeted mission.Our results show that the QMIX algorithm outperforms the other three algorithms in the attacking scenario,while the VDN algorithm achieves the best results in the defending scenario.Specifically,the VDNalgorithmreaches the highest value of battle wonmean and the lowest value of dead alliesmean.Our research demonstrates the potential forMARL algorithms to be used in real-world applications,such as controllingmultiple robots to provide helpful services or coordinating teams of agents to accomplish tasks that would be impossible for a human to do.The SMAC environment provides a unique opportunity to test and evaluate MARL algorithms in a challenging and dynamic environment,and our results show that these algorithms can be used to achieve victory with minimal damage.展开更多
This article presents an event-triggered H_(∞) consensus control scheme using reinforcement learning (RL) for nonlinear second-order multi-agent systems (MASs) with control constraints. First, considering control con...This article presents an event-triggered H_(∞) consensus control scheme using reinforcement learning (RL) for nonlinear second-order multi-agent systems (MASs) with control constraints. First, considering control constraints, the constrained H_(∞) consensus problem is transformed into a multi-player zero-sum game with non-quadratic performance functions. Then, an event-triggered control method is presented to conserve communication resources and a new triggering condition is developed for each agent to make the triggering threshold independent of the disturbance attenuation level. To derive the optimal controller that can minimize the cost function in the case of worst disturbance, a constrained Hamilton–Jacobi–Bellman (HJB) equation is defined. Since it is difficult to solve analytically due to its strongly non-linearity, reinforcement learning (RL) is implemented to obtain the optimal controller. In specific, the optimal performance function and the worst-case disturbance are approximated by a time-triggered critic network;meanwhile, the optimal controller is approximated by event-triggered actor network. After that, Lyapunov analysis is utilized to prove the uniformly ultimately bounded (UUB) stability of the system and that the network weight errors are UUB. Finally, a simulation example is utilized to demonstrate the effectiveness of the control strategy provided.展开更多
The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multi...The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multimodal multi-objective evolutionary algorithm based on deep reinforcement learning is proposed in this paper.The improved multimodal multi-objective evolutionary algorithm is used to solve multi-robot task allo-cation problems.Moreover,a deep reinforcement learning strategy is used in the last generation to provide a high-quality path for each assigned robot via an end-to-end manner.Comparisons with three popular multimodal multi-objective evolutionary algorithms on three different scenarios of multi-robot task allocation problems are carried out to verify the performance of the proposed algorithm.The experimental test results show that the proposed algorithm can generate sufficient equivalent schemes to improve the availability and robustness of multi-robot collaborative systems in uncertain environments,and also produce the best scheme to improve the overall task execution efficiency of multi-robot collaborative systems.展开更多
Chimp Optimization Algorithm(ChOA)is one of the most efficient recent optimization algorithms,which proved its ability to deal with different problems in various do-mains.However,ChOA suffers from the weakness of the ...Chimp Optimization Algorithm(ChOA)is one of the most efficient recent optimization algorithms,which proved its ability to deal with different problems in various do-mains.However,ChOA suffers from the weakness of the local search technique which leads to a loss of diversity,getting stuck in a local minimum,and procuring premature convergence.In response to these defects,this paper proposes an improved ChOA algorithm based on using Opposition-based learning(OBL)to enhance the choice of better solutions,written as OChOA.Then,utilizing Reinforcement Learning(RL)to improve the local research technique of OChOA,called RLOChOA.This way effectively avoids the algorithm falling into local optimum.The performance of the proposed RLOChOA algorithm is evaluated using the Friedman rank test on a set of CEC 2015 and CEC 2017 benchmark functions problems and a set of CEC 2011 real-world problems.Numerical results and statistical experiments show that RLOChOA provides better solution quality,convergence accuracy and stability compared with other state-of-the-art algorithms.展开更多
Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path pl...Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path planning algorithm incorporating improved IB-RRT∗and deep reinforce-ment learning(DRL)is proposed.Firstly,an improved IB-RRT∗algorithm is proposed for global path planning by combining double elliptic subset sampling and probabilistic central circle target bi-as.Then,to tackle the slow response to dynamic obstacles and inadequate obstacle avoidance of tra-ditional local path planning algorithms,deep reinforcement learning is utilized to predict the move-ment trend of dynamic obstacles,leading to a dynamic fusion path planning.Finally,the simulation and experiment results demonstrate that the proposed improved IB-RRT∗algorithm has higher con-vergence speed and search efficiency compared with traditional Bi-RRT∗,Informed-RRT∗,and IB-RRT∗algorithms.Furthermore,the proposed fusion algorithm can effectively perform real-time obsta-cle avoidance and navigation tasks for mobile robots in unstructured environments.展开更多
A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robnsticity of solutions. The robnsticity of solutions maybe becomes a very important property for a learning s...A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robnsticity of solutions. The robnsticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or rain) is applied to study a class of important learning algorithms, dynamic prOgramming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robnsticity of reinforcement learning algorithms theoretically.展开更多
An aero-engine maintenance policy plays a crucial role in reasonably reducing maintenance cost. An aero-engine is a type of complex equipment with long service-life. In engineering,a hybrid maintenance strategy is ado...An aero-engine maintenance policy plays a crucial role in reasonably reducing maintenance cost. An aero-engine is a type of complex equipment with long service-life. In engineering,a hybrid maintenance strategy is adopted to improve the aero-engine operational reliability. Thus,the long service-life and the hybrid maintenance strategy should be considered synchronously in aero-engine maintenance policy optimization. This paper proposes an aero-engine life-cycle maintenance policy optimization algorithm that synchronously considers the long service-life and the hybrid maintenance strategy. The reinforcement learning approach was adopted to illustrate the optimization framework, in which maintenance policy optimization was formulated as a Markov decision process. In the reinforcement learning framework, the Gauss–Seidel value iteration algorithm was adopted to optimize the maintenance policy. Compared with traditional aero-engine maintenance policy optimization methods, the long service-life and the hybrid maintenance strategy could be addressed synchronously by the proposed algorithm. Two numerical experiments and algorithm analyses were performed to illustrate the optimization algorithm in detail.展开更多
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj...This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.展开更多
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-...Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.展开更多
Due to the characteristics of line-of-sight(LoS)communication in unmanned aerial vehicle(UAV)networks,these systems are highly susceptible to eavesdropping and surveillance.To effectively address the security concerns...Due to the characteristics of line-of-sight(LoS)communication in unmanned aerial vehicle(UAV)networks,these systems are highly susceptible to eavesdropping and surveillance.To effectively address the security concerns in UAV communication,covert communication methods have been adopted.This paper explores the joint optimization problem of trajectory and transmission power in a multi-hop UAV relay covert communication system.Considering the communication covertness,power constraints,and trajectory limitations,an algorithm based on multi-agent proximal policy optimization(MAPPO),named covert-MAPPO(C-MAPPO),is proposed.The proposed method leverages the strengths of both optimization algorithms and reinforcement learning to analyze and make joint decisions on the transmission power and flight trajectory strategies for UAVs to achieve cooperation.Simulation results demonstrate that the proposed method can maximize the system throughput while satisfying covertness constraints,and it outperforms benchmark algorithms in terms of system throughput and reward convergence speed.展开更多
文摘With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.
文摘In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learning algorithms suffer the slow convergence rate because of the enormous learning space produced by joint-action. In this article, a prediction-based reinforcement learning algorithm is presented for multi-agent cooperation tasks, which demands all agents to learn predicting the probabilities of actions that other agents may execute. A multi-robot cooperation experiment is run to test the efficacy of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation policy much faster than the primitive reinforcement learning algorithm.
基金National Key Research and Development Program(2021YFB2900604)。
文摘Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link.
基金This project was funded by Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah underGrant No.(IFPIP-1127-611-1443)the authors,therefore,acknowledge with thanks DSR technical and financial support.
文摘In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess.
基金Sponsored bythe Ministerial Level Foundation (70302)
文摘Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on multi-agent inverted pendulum is made to test the efficency of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation strategy much faster than the primitive multiagent reinforcement learning algorithm.
基金supported in part by the National Natural Science Foundation of China(Grant No.62273107,61702127,62272113)Science and Technology Program of Guangzhou(Grant No.201804010461).
文摘Low earth orbit (LEO) satellite networkscan provide wider service coverage and lower latencythan traditional terrestrial networks, which haveattracted considerable attention. However, the unevendistribution of human population and data trafficon the ground incurs unbalanced traffic load inLEO satellite networks. To this end, we proposea load-balancing routing algorithm for LEO satellitenetworks based on ant colony optimization and reinforcementlearning. In the ant colony algorithm,we improve the pheromone update rule by introducingload-aware heuristic information, e.g., the currentnode transmission overhead, delay and load status, andreinforcement learning-based link quality evaluation.It enables the routing algorithm to select the lightlyloaded node as the next hop to balance the networkload. We simulate and verify the proposed algorithmusing the NS2 simulation platform, and the resultsshow that our algorithm improves the data delivery ratioand throughput while ensuring lower latency andtransmission overhead.
基金supported by National Natural Science Foundation of China (No. 62071488 and No. 62061013)
文摘In multi-agent confrontation scenarios, a jammer is constrained by the single limited performance and inefficiency of practical application. To cope with these issues, this paper aims to investigate the multi-agent jamming problem in a multi-user scenario, where the coordination between the jammers is considered. Firstly, a multi-agent Markov decision process (MDP) framework is used to model and analyze the multi-agent jamming problem. Secondly, a collaborative multi-agent jamming algorithm (CMJA) based on reinforcement learning is proposed. Finally, an actual intelligent jamming system is designed and built based on software-defined radio (SDR) platform for simulation and platform verification. The simulation and platform verification results show that the proposed CMJA algorithm outperforms the independent Q-learning method and provides a better jamming effect.
基金co-supported by the National Natural Science Foundation of China(No.52272382)the Aeronautical Science Foundation of China(No.20200017051001)the Fundamental Research Funds for the Central Universities,China.
文摘Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance in cooperative decision-making,it is challenging for existing MARL algorithms to quickly converge to an optimal strategy for UCAV formation in BVR air combat where confrontation is complicated and reward is extremely sparse and delayed.Aiming to solve this problem,this paper proposes an Advantage Highlight Multi-Agent Proximal Policy Optimization(AHMAPPO)algorithm.First,at every step,the AHMAPPO records the degree to which the best formation exceeds the average of formations in parallel environments and carries out additional advantage sampling according to it.Then,the sampling result is introduced into the updating process of the actor network to improve its optimization efficiency.Finally,the simulation results reveal that compared with some state-of-the-art MARL algorithms,the AHMAPPO can obtain a more excellent strategy utilizing fewer sample episodes in the UCAV formation BVR air combat simulation environment built in this paper,which can reflect the critical features of BVR air combat.The AHMAPPO can significantly increase the convergence efficiency of the strategy for UCAV formation in BVR air combat,with a maximum increase of 81.5%relative to other algorithms.
基金This research is supported by Natural Science Foundation of Hunan Province(No.2019JJ40145)Scientific Research Key Project of Hunan Education Department(No.19A273)open Fund of Key Laboratory of Hunan Province(2017TP1026).
文摘Distributed wireless sensor networks have been shown to be effective for environmental monitoring tasks,in which multiple sensors are deployed in a wide range of the environments to collect information or monitor a particular event,Wireless sensor networks,consisting of a large number of interacting sensors,have been successful in a variety of applications where they are able to share information using different transmission protocols through the communication network.However,the irregular and dynamic environment requires traditional wireless sensor networks to have frequent communications to exchange the most recent information,which can easily generate high communication cost through the collaborative data collection and data transmission.High frequency communication also has high probability of failure because of long distance data transmission.In this paper,we developed a novel approach to multi-sensor environment monitoring network using the idea of distributed system.Its communication network can overcome the difficulties of high communication cost and Single Point of Failure(SPOF)through the decentralized approach,which performs in-network computation.Our approach makes use of Boolean networks that allows for a non-complex method of corroboration and retains meaningful information regarding the dynamics of the communication network.Our approach also reduces the complexity of data aggregation process and employee a reinforcement learning algorithm to predict future event inside the environment through the pattern recognition.
基金The financial support fromthe Major Science and Technology Programs inHenan Province(Grant No.241100210100)National Natural Science Foundation of China(Grant No.62102372)+3 种基金Henan Provincial Department of Science and Technology Research Project(Grant No.242102211068)Henan Provincial Department of Science and Technology Research Project(Grant No.232102210078)the Stabilization Support Program of The Shenzhen Science and Technology Innovation Commission(Grant No.20231130110921001)the Key Scientific Research Project of Higher Education Institutions of Henan Province(Grant No.24A520042)is acknowledged.
文摘Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(SFC)under 5G networks,this paper proposes a multi-agent deep deterministic policy gradient optimization algorithm for SFC deployment(MADDPG-SD).Initially,an optimization model is devised to enhance the request acceptance rate,minimizing the latency and deploying the cost SFC is constructed for the network resource-constrained case.Subsequently,we model the dynamic problem as a Markov decision process(MDP),facilitating adaptation to the evolving states of network resources.Finally,by allocating SFCs to different agents and adopting a collaborative deployment strategy,each agent aims to maximize the request acceptance rate or minimize latency and costs.These agents learn strategies from historical data of virtual network functions in SFCs to guide server node selection,and achieve approximately optimal SFC deployment strategies through a cooperative framework of centralized training and distributed execution.Experimental simulation results indicate that the proposed method,while simultaneously meeting performance requirements and resource capacity constraints,has effectively increased the acceptance rate of requests compared to the comparative algorithms,reducing the end-to-end latency by 4.942%and the deployment cost by 8.045%.
基金supported in part by United States Air Force Research Institute for Tactical Autonomy(RITA)University Affiliated Research Center(UARC)in part by the United States Air Force Office of Scientific Research(AFOSR)Contract FA9550-22-1-0268 awarded to KHA,https://www.afrl.af.mil/AFOSR/The contract is entitled:“Investigating Improving Safety of Autonomous Exploring Intelligent Agents with Human-in-the-Loop Reinforcement Learning,”and in part by Jackson State University.
文摘Multi-Agent Reinforcement Learning(MARL)has proven to be successful in cooperative assignments.MARL is used to investigate how autonomous agents with the same interests can connect and act in one team.MARL cooperation scenarios are explored in recreational cooperative augmented reality environments,as well as realworld scenarios in robotics.In this paper,we explore the realm of MARL and its potential applications in cooperative assignments.Our focus is on developing a multi-agent system that can collaborate to attack or defend against enemies and achieve victory withminimal damage.To accomplish this,we utilize the StarCraftMulti-Agent Challenge(SMAC)environment and train four MARL algorithms:Q-learning with Mixtures of Experts(QMIX),Value-DecompositionNetwork(VDN),Multi-agent Proximal PolicyOptimizer(MAPPO),andMulti-Agent Actor Attention Critic(MAA2C).These algorithms allow multiple agents to cooperate in a specific scenario to achieve the targeted mission.Our results show that the QMIX algorithm outperforms the other three algorithms in the attacking scenario,while the VDN algorithm achieves the best results in the defending scenario.Specifically,the VDNalgorithmreaches the highest value of battle wonmean and the lowest value of dead alliesmean.Our research demonstrates the potential forMARL algorithms to be used in real-world applications,such as controllingmultiple robots to provide helpful services or coordinating teams of agents to accomplish tasks that would be impossible for a human to do.The SMAC environment provides a unique opportunity to test and evaluate MARL algorithms in a challenging and dynamic environment,and our results show that these algorithms can be used to achieve victory with minimal damage.
文摘This article presents an event-triggered H_(∞) consensus control scheme using reinforcement learning (RL) for nonlinear second-order multi-agent systems (MASs) with control constraints. First, considering control constraints, the constrained H_(∞) consensus problem is transformed into a multi-player zero-sum game with non-quadratic performance functions. Then, an event-triggered control method is presented to conserve communication resources and a new triggering condition is developed for each agent to make the triggering threshold independent of the disturbance attenuation level. To derive the optimal controller that can minimize the cost function in the case of worst disturbance, a constrained Hamilton–Jacobi–Bellman (HJB) equation is defined. Since it is difficult to solve analytically due to its strongly non-linearity, reinforcement learning (RL) is implemented to obtain the optimal controller. In specific, the optimal performance function and the worst-case disturbance are approximated by a time-triggered critic network;meanwhile, the optimal controller is approximated by event-triggered actor network. After that, Lyapunov analysis is utilized to prove the uniformly ultimately bounded (UUB) stability of the system and that the network weight errors are UUB. Finally, a simulation example is utilized to demonstrate the effectiveness of the control strategy provided.
基金the Shanghai Pujiang Program (No.22PJD030),the National Natural Science Foundation of China (Nos.61603244 and 71904116)the National Natural Science Foundation of China-Shandong Joint Fund (No.U2006228)。
文摘The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multimodal multi-objective evolutionary algorithm based on deep reinforcement learning is proposed in this paper.The improved multimodal multi-objective evolutionary algorithm is used to solve multi-robot task allo-cation problems.Moreover,a deep reinforcement learning strategy is used in the last generation to provide a high-quality path for each assigned robot via an end-to-end manner.Comparisons with three popular multimodal multi-objective evolutionary algorithms on three different scenarios of multi-robot task allocation problems are carried out to verify the performance of the proposed algorithm.The experimental test results show that the proposed algorithm can generate sufficient equivalent schemes to improve the availability and robustness of multi-robot collaborative systems in uncertain environments,and also produce the best scheme to improve the overall task execution efficiency of multi-robot collaborative systems.
文摘Chimp Optimization Algorithm(ChOA)is one of the most efficient recent optimization algorithms,which proved its ability to deal with different problems in various do-mains.However,ChOA suffers from the weakness of the local search technique which leads to a loss of diversity,getting stuck in a local minimum,and procuring premature convergence.In response to these defects,this paper proposes an improved ChOA algorithm based on using Opposition-based learning(OBL)to enhance the choice of better solutions,written as OChOA.Then,utilizing Reinforcement Learning(RL)to improve the local research technique of OChOA,called RLOChOA.This way effectively avoids the algorithm falling into local optimum.The performance of the proposed RLOChOA algorithm is evaluated using the Friedman rank test on a set of CEC 2015 and CEC 2017 benchmark functions problems and a set of CEC 2011 real-world problems.Numerical results and statistical experiments show that RLOChOA provides better solution quality,convergence accuracy and stability compared with other state-of-the-art algorithms.
基金the National Natural Science Foundation of China(No.61973275)。
文摘Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path planning algorithm incorporating improved IB-RRT∗and deep reinforce-ment learning(DRL)is proposed.Firstly,an improved IB-RRT∗algorithm is proposed for global path planning by combining double elliptic subset sampling and probabilistic central circle target bi-as.Then,to tackle the slow response to dynamic obstacles and inadequate obstacle avoidance of tra-ditional local path planning algorithms,deep reinforcement learning is utilized to predict the move-ment trend of dynamic obstacles,leading to a dynamic fusion path planning.Finally,the simulation and experiment results demonstrate that the proposed improved IB-RRT∗algorithm has higher con-vergence speed and search efficiency compared with traditional Bi-RRT∗,Informed-RRT∗,and IB-RRT∗algorithms.Furthermore,the proposed fusion algorithm can effectively perform real-time obsta-cle avoidance and navigation tasks for mobile robots in unstructured environments.
基金Project supported by the National Natural Science Foundation of China (Nos. 10471088 and 60572126)
文摘A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robnsticity of solutions. The robnsticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or rain) is applied to study a class of important learning algorithms, dynamic prOgramming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robnsticity of reinforcement learning algorithms theoretically.
基金co-supported by the Key National Natural Science Foundation of China (No. U1533202)the Civil Aviation Administration of China (No. MHRD20150104)the Shandong Independent Innovation and Achievements Transformation Fund, China (No. 2014CGZH1101)
文摘An aero-engine maintenance policy plays a crucial role in reasonably reducing maintenance cost. An aero-engine is a type of complex equipment with long service-life. In engineering,a hybrid maintenance strategy is adopted to improve the aero-engine operational reliability. Thus,the long service-life and the hybrid maintenance strategy should be considered synchronously in aero-engine maintenance policy optimization. This paper proposes an aero-engine life-cycle maintenance policy optimization algorithm that synchronously considers the long service-life and the hybrid maintenance strategy. The reinforcement learning approach was adopted to illustrate the optimization framework, in which maintenance policy optimization was formulated as a Markov decision process. In the reinforcement learning framework, the Gauss–Seidel value iteration algorithm was adopted to optimize the maintenance policy. Compared with traditional aero-engine maintenance policy optimization methods, the long service-life and the hybrid maintenance strategy could be addressed synchronously by the proposed algorithm. Two numerical experiments and algorithm analyses were performed to illustrate the optimization algorithm in detail.
基金supported by the National Natural Science Foundation of China(Nos.12272104,U22B2013).
文摘This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.
基金The National Natural Science Foundation of China(62136008,62293541)The Beijing Natural Science Foundation(4232056)The Beijing Nova Program(20240484514).
文摘Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.
基金supported by the Natural Science Foundation of Jiangsu Province,China(No.BK20240200)in part by the National Natural Science Foundation of China(Nos.62271501,62071488,62471489 and U22B2002)+1 种基金in part by the Key Technologies R&D Program of Jiangsu,China(Prospective and Key Technologies for Industry)(Nos.BE2023022 and BE2023022-4)in part by the Post-doctoral Fellowship Program of CPSF,China(No.GZB20240996).
文摘Due to the characteristics of line-of-sight(LoS)communication in unmanned aerial vehicle(UAV)networks,these systems are highly susceptible to eavesdropping and surveillance.To effectively address the security concerns in UAV communication,covert communication methods have been adopted.This paper explores the joint optimization problem of trajectory and transmission power in a multi-hop UAV relay covert communication system.Considering the communication covertness,power constraints,and trajectory limitations,an algorithm based on multi-agent proximal policy optimization(MAPPO),named covert-MAPPO(C-MAPPO),is proposed.The proposed method leverages the strengths of both optimization algorithms and reinforcement learning to analyze and make joint decisions on the transmission power and flight trajectory strategies for UAVs to achieve cooperation.Simulation results demonstrate that the proposed method can maximize the system throughput while satisfying covertness constraints,and it outperforms benchmark algorithms in terms of system throughput and reward convergence speed.