This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize t...This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize to the state of a command multi-agent dynamical systems, where pinning control is used generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents' dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.展开更多
In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control p...In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control policy using a single-network approximate dynamic programming(ADP) where only one critic neural network(NN) is employed instead of typical actorcritic structure composed of two NNs. The proposed distributed weight tuning laws for critic NNs guarantee stability in the sense of uniform ultimate boundedness(UUB) and convergence of control policies to the Nash equilibrium. In this paper, by introducing novel distributed local operators in weight tuning laws, there is no more requirement for initial stabilizing control policies. Furthermore, the overall closed-loop system stability is guaranteed by Lyapunov stability analysis. Finally, Simulation results show the effectiveness of the proposed algorithm.展开更多
In this paper,a zero-sum game Nash equilibrium computation problem with a common constraint set is investigated under two time-varying multi-agent subnetworks,where the two subnetworks have opposite payoff function.A ...In this paper,a zero-sum game Nash equilibrium computation problem with a common constraint set is investigated under two time-varying multi-agent subnetworks,where the two subnetworks have opposite payoff function.A novel distributed projection subgradient algorithm with random sleep scheme is developed to reduce the calculation amount of agents in the process of computing Nash equilibrium.In our algorithm,each agent is determined by an independent identically distributed Bernoulli decision to compute the subgradient and perform the projection operation or to keep the previous consensus estimate,it effectively reduces the amount of computation and calculation time.Moreover,the traditional assumption of stepsize adopted in the existing methods is removed,and the stepsizes in our algorithm are randomized diminishing.Besides,we prove that all agents converge to Nash equilibrium with probability 1 by our algorithm.Finally,a simulation example verifies the validity of our algorithm.展开更多
This article formulates interactive adversarial differential graphical games for synchronization control of multiagent systems(MASs)subject to adversarial inputs interacting with the systems through topology communica...This article formulates interactive adversarial differential graphical games for synchronization control of multiagent systems(MASs)subject to adversarial inputs interacting with the systems through topology communications.Local control and interactive adversarial inputs affect each agent's local synchronization error via local networks.The distributed global Nash equilibrium(NE)solutions are guaranteed in the games by solving the optimal control input of each agent and the worst-case adversarial input based solely on local states and communications.The asymptotic stability of the local synchronization error dynamics and the NE are guaranteed.Furthermore,the authors devise a data-driven online reinforcement learning(RL)algorithm that only computes the distributed Nash control online using system trajectory data,eliminating the need for explicit system dynamics.A simulation-based example validates the game and algorithm.展开更多
Nowadays,China is the largest developing country in the world,and the US is the largest developed country in the world.Sino-US economic and trade relations are of great significance to the two nations and may have apr...Nowadays,China is the largest developing country in the world,and the US is the largest developed country in the world.Sino-US economic and trade relations are of great significance to the two nations and may have aprominent impact on the stability and development of the global economy.展开更多
In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pe...In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pessimistic Pareto optimal solution concept by assuming that a player supposes the opponent adopts the most disadvantage strategy for the self. It is shown that any pessimistic Pareto optimal solution can be obtained on the basis of linear programming techniques even if the membership functions for the objective functions are nonlinear. Moreover, we propose interactive algorithms based on the bisection method to obtain a pessimistic compromise solution from among the set of all pessimistic Pareto optimal solutions. In order to show the efficiency of the proposed method, we illustrate interactive processes of an application to a vegetable shipment problem.展开更多
There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum ga...There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum game, and prove that it is theoretically a polynomial time algorithm. We implement the Karmarkar method, and a preliminary computational result shows that it performs well for zero-sum games. We also mention an affine scaling method that would help us compute Nash equilibria of general zero-sum games effectively.展开更多
In 3D games, a lot of weapons in the movement will drag a "follow the shadow" effect, which is called the "track". In this paper, we first analyze the change rule of the "track", and then put forward a kind of a...In 3D games, a lot of weapons in the movement will drag a "follow the shadow" effect, which is called the "track". In this paper, we first analyze the change rule of the "track", and then put forward a kind of algorithm to realize the "track". The calculation of this algorithm is small, but the effect is very real, has been successfully applied to a variety of 3D games.展开更多
This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics.On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games,...This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics.On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games,where the system dynamics is not required.By analyzing the value function iterations,the convergence of the model-based algorithm is shown.The equivalence of several types of value iteration algorithms is established.The effectiveness of model-free algorithms is demonstrated by a numerical example.展开更多
Minimax algorithm and machine learning technologies have been studied for decades to reach an ideal optimization in game areas such as chess and backgammon. In these fields, several generations try to optimize the cod...Minimax algorithm and machine learning technologies have been studied for decades to reach an ideal optimization in game areas such as chess and backgammon. In these fields, several generations try to optimize the code for pruning and effectiveness of evaluation function. Thus, there are well-armed algorithms to deal with various sophisticated situations in gaming occasion. However, as a traditional zero-sum game, Connect-4 receives less attention compared with the other members of its zero-sum family using traditional minimax algorithm. In recent years, new generation of heuristics is created to address this problem based on research conclusions, expertise and gaming experiences. However, this paper mainly introduced a self-developed heuristics supported by well-demonstrated result from researches and our own experiences which fighting against the available version of Connect-4 system online. While most previous works focused on winning algorithms and knowledge based approaches, we complement these works with analysis of heuristics. We have conducted three experiments on the relationship among functionality, depth of searching and number of features and doing contrastive test with sample online. Different from the sample based on summarized experience and generalized features, our heuristics have a basic concentration on detailed connection between pieces on board. By analysing the winning percentages when our version fights against the online sample with different searching depths, we find that our heuristics with minimax algorithm is perfect on the early stages of the zero-sum game playing. Because some nodes in the game tree have no influence on the final decision of minimax algorithm, we use alpha-beta pruning to decrease the number of meaningless node which greatly increases the minimax efficiency. During the contrastive experiment with the online sample, this paper also verifies basic characters of the minimax algorithm including depths and quantity of features. According to the experiment, these two characters can both effect the decision for each step and none of them can be absolutely in charge. Besides, we also explore some potential future issues in Connect-4 game optimization such as precise adjustment on heuristic values and inefficiency pruning on the search tree.展开更多
Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-f...Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios.展开更多
The helicopter Trailing-Edge Flaps(TEFs)technology is one of the recent hot topics in morphing wing research.By employing controlled deflection,TEFs can effectively reduce the vibration level of helicopters.Thus,desig...The helicopter Trailing-Edge Flaps(TEFs)technology is one of the recent hot topics in morphing wing research.By employing controlled deflection,TEFs can effectively reduce the vibration level of helicopters.Thus,designing specific vibration reduction control methods for the helicopters equipped with trailing-edge flaps is of significant practical value.This paper studies the optimal control problem for helicopter-vibration systems with TEFs under the framework of adaptive dynamic programming combined with Reinforcement Learning(RL).Time-delay and disturbances,caused by complexity of helicopter dynamics,inevitably deteriorate the control performance of vibration reduction.To solve this problem,a zero-sum game formulation with a linear quadratic form for reducing vibration of helicopter systems is presented with a virtual predictor.In this context,an off-policy reinforcement learning algorithm is developed to determine the optimal control policy.The algorithm utilizes only vertical vibration load data to achieve a policy that reduces vibration,attains Nash equilibrium,and addresses disturbances while compensating for time-delay without knowledge of the dynamics of the helicopter system.The effectiveness of the proposed method is demonstrated in a virtual platform.展开更多
In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of th...In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of the iterative value function sequence is accelerated significantly with the relaxation factor.Second,the convergence and monotonicity of the value function sequence are analyzed under different ranges of the relaxation factor.Third,two practical approaches,namely the integrated scheme and the relaxation function,are introduced into the accelerated VI algorithm to guarantee the convergence of the iterative value function sequence for zero-sum games.The integrated scheme consists of the accelerated stage and the convergence stage,and the relaxation function can adjust the value of the relaxation factor.Finally,including the autopilot controller,the fantastic performance of the accelerated VI algorithm is verified through two examples with practical physical backgrounds.展开更多
In this paper,based on ACP(ACP:artificial societies,computational experiments,and parallel execution)approach,a parallel control method is proposed for zero-sum games of unknown time-varying systems.The process of con...In this paper,based on ACP(ACP:artificial societies,computational experiments,and parallel execution)approach,a parallel control method is proposed for zero-sum games of unknown time-varying systems.The process of constructing a sequence of artificial systems,implementing the computational experiments,and conducting the parallel execution is presented.The artificial systems are constructed to model the real system.Computational experiments adopting adaptive dynamic programming(ADP)are shown to derive control laws for a sequence of artificial systems.The purpose of the parallel execution step is to derive the control laws for the real system.Finally,simulation experiments are provided to show the effectiveness of the proposed method.展开更多
基金supported by the Deanship of Scientific Research at King Fahd University of Petroleum & Minerals Project(No.JF141002)the National Science Foundation(No.ECCS-1405173)+3 种基金the Office of Naval Research(Nos.N000141310562,N000141410718)the U.S. Army Research Office(No.W911NF-11-D-0001)the National Natural Science Foundation of China(No.61120106011)the Project 111 from the Ministry of Education of China(No.B08015)
文摘This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize to the state of a command multi-agent dynamical systems, where pinning control is used generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents' dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.
文摘In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control policy using a single-network approximate dynamic programming(ADP) where only one critic neural network(NN) is employed instead of typical actorcritic structure composed of two NNs. The proposed distributed weight tuning laws for critic NNs guarantee stability in the sense of uniform ultimate boundedness(UUB) and convergence of control policies to the Nash equilibrium. In this paper, by introducing novel distributed local operators in weight tuning laws, there is no more requirement for initial stabilizing control policies. Furthermore, the overall closed-loop system stability is guaranteed by Lyapunov stability analysis. Finally, Simulation results show the effectiveness of the proposed algorithm.
文摘In this paper,a zero-sum game Nash equilibrium computation problem with a common constraint set is investigated under two time-varying multi-agent subnetworks,where the two subnetworks have opposite payoff function.A novel distributed projection subgradient algorithm with random sleep scheme is developed to reduce the calculation amount of agents in the process of computing Nash equilibrium.In our algorithm,each agent is determined by an independent identically distributed Bernoulli decision to compute the subgradient and perform the projection operation or to keep the previous consensus estimate,it effectively reduces the amount of computation and calculation time.Moreover,the traditional assumption of stepsize adopted in the existing methods is removed,and the stepsizes in our algorithm are randomized diminishing.Besides,we prove that all agents converge to Nash equilibrium with probability 1 by our algorithm.Finally,a simulation example verifies the validity of our algorithm.
文摘This article formulates interactive adversarial differential graphical games for synchronization control of multiagent systems(MASs)subject to adversarial inputs interacting with the systems through topology communications.Local control and interactive adversarial inputs affect each agent's local synchronization error via local networks.The distributed global Nash equilibrium(NE)solutions are guaranteed in the games by solving the optimal control input of each agent and the worst-case adversarial input based solely on local states and communications.The asymptotic stability of the local synchronization error dynamics and the NE are guaranteed.Furthermore,the authors devise a data-driven online reinforcement learning(RL)algorithm that only computes the distributed Nash control online using system trajectory data,eliminating the need for explicit system dynamics.A simulation-based example validates the game and algorithm.
文摘Nowadays,China is the largest developing country in the world,and the US is the largest developed country in the world.Sino-US economic and trade relations are of great significance to the two nations and may have aprominent impact on the stability and development of the global economy.
文摘In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pessimistic Pareto optimal solution concept by assuming that a player supposes the opponent adopts the most disadvantage strategy for the self. It is shown that any pessimistic Pareto optimal solution can be obtained on the basis of linear programming techniques even if the membership functions for the objective functions are nonlinear. Moreover, we propose interactive algorithms based on the bisection method to obtain a pessimistic compromise solution from among the set of all pessimistic Pareto optimal solutions. In order to show the efficiency of the proposed method, we illustrate interactive processes of an application to a vegetable shipment problem.
文摘There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum game, and prove that it is theoretically a polynomial time algorithm. We implement the Karmarkar method, and a preliminary computational result shows that it performs well for zero-sum games. We also mention an affine scaling method that would help us compute Nash equilibria of general zero-sum games effectively.
文摘In 3D games, a lot of weapons in the movement will drag a "follow the shadow" effect, which is called the "track". In this paper, we first analyze the change rule of the "track", and then put forward a kind of algorithm to realize the "track". The calculation of this algorithm is small, but the effect is very real, has been successfully applied to a variety of 3D games.
基金supported by the National Natural Science Foundation of China under Grant Nos.62122043,62192753,62433020,T2293770Natural Science Foundation of Shandong Province for Distinguished Young Scholars under Grant No.ZR2022JQ31.
文摘This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics.On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games,where the system dynamics is not required.By analyzing the value function iterations,the convergence of the model-based algorithm is shown.The equivalence of several types of value iteration algorithms is established.The effectiveness of model-free algorithms is demonstrated by a numerical example.
文摘Minimax algorithm and machine learning technologies have been studied for decades to reach an ideal optimization in game areas such as chess and backgammon. In these fields, several generations try to optimize the code for pruning and effectiveness of evaluation function. Thus, there are well-armed algorithms to deal with various sophisticated situations in gaming occasion. However, as a traditional zero-sum game, Connect-4 receives less attention compared with the other members of its zero-sum family using traditional minimax algorithm. In recent years, new generation of heuristics is created to address this problem based on research conclusions, expertise and gaming experiences. However, this paper mainly introduced a self-developed heuristics supported by well-demonstrated result from researches and our own experiences which fighting against the available version of Connect-4 system online. While most previous works focused on winning algorithms and knowledge based approaches, we complement these works with analysis of heuristics. We have conducted three experiments on the relationship among functionality, depth of searching and number of features and doing contrastive test with sample online. Different from the sample based on summarized experience and generalized features, our heuristics have a basic concentration on detailed connection between pieces on board. By analysing the winning percentages when our version fights against the online sample with different searching depths, we find that our heuristics with minimax algorithm is perfect on the early stages of the zero-sum game playing. Because some nodes in the game tree have no influence on the final decision of minimax algorithm, we use alpha-beta pruning to decrease the number of meaningless node which greatly increases the minimax efficiency. During the contrastive experiment with the online sample, this paper also verifies basic characters of the minimax algorithm including depths and quantity of features. According to the experiment, these two characters can both effect the decision for each step and none of them can be absolutely in charge. Besides, we also explore some potential future issues in Connect-4 game optimization such as precise adjustment on heuristic values and inefficiency pruning on the search tree.
文摘Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios.
基金co-supported by the National Natural Science Foundation of China(Nos.62022060,62073234,62073158,62373268,62373273)the Basic Research Project of Education Department of Liaoning Province,China(No.LJKZ0401).
文摘The helicopter Trailing-Edge Flaps(TEFs)technology is one of the recent hot topics in morphing wing research.By employing controlled deflection,TEFs can effectively reduce the vibration level of helicopters.Thus,designing specific vibration reduction control methods for the helicopters equipped with trailing-edge flaps is of significant practical value.This paper studies the optimal control problem for helicopter-vibration systems with TEFs under the framework of adaptive dynamic programming combined with Reinforcement Learning(RL).Time-delay and disturbances,caused by complexity of helicopter dynamics,inevitably deteriorate the control performance of vibration reduction.To solve this problem,a zero-sum game formulation with a linear quadratic form for reducing vibration of helicopter systems is presented with a virtual predictor.In this context,an off-policy reinforcement learning algorithm is developed to determine the optimal control policy.The algorithm utilizes only vertical vibration load data to achieve a policy that reduces vibration,attains Nash equilibrium,and addresses disturbances while compensating for time-delay without knowledge of the dynamics of the helicopter system.The effectiveness of the proposed method is demonstrated in a virtual platform.
基金supported in part by the National Natural Science Foundation of China under Grant 62222301,Grant 61890930-5,and Grant 62021003the National Science and Technology Major Project under Grant 2021ZD0112302 and Grant 2021ZD0112301the Beijing Natural Science Foundation under Grant JQ19013.
文摘In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of the iterative value function sequence is accelerated significantly with the relaxation factor.Second,the convergence and monotonicity of the value function sequence are analyzed under different ranges of the relaxation factor.Third,two practical approaches,namely the integrated scheme and the relaxation function,are introduced into the accelerated VI algorithm to guarantee the convergence of the iterative value function sequence for zero-sum games.The integrated scheme consists of the accelerated stage and the convergence stage,and the relaxation function can adjust the value of the relaxation factor.Finally,including the autopilot controller,the fantastic performance of the accelerated VI algorithm is verified through two examples with practical physical backgrounds.
基金supported in part by the National Key R&D Program of China(No.2021YFE0206100)the National Natural Science Foundation of China(Nos.62073321 and 62273036)+2 种基金the National Defense Basic Scientific Research Program(No.JCKY2019203C029)the Science and Technology Development Fund,Macao SAR(Nos.FDCT-22-009-MISE and 0060/2021/A20015/2020/AMJ)the State Key Lab of Rail Traffic Control&Safety(No.RCS2021K005).
文摘In this paper,based on ACP(ACP:artificial societies,computational experiments,and parallel execution)approach,a parallel control method is proposed for zero-sum games of unknown time-varying systems.The process of constructing a sequence of artificial systems,implementing the computational experiments,and conducting the parallel execution is presented.The artificial systems are constructed to model the real system.Computational experiments adopting adaptive dynamic programming(ADP)are shown to derive control laws for a sequence of artificial systems.The purpose of the parallel execution step is to derive the control laws for the real system.Finally,simulation experiments are provided to show the effectiveness of the proposed method.