In this paper,a zero-sum game Nash equilibrium computation problem with a common constraint set is investigated under two time-varying multi-agent subnetworks,where the two subnetworks have opposite payoff function.A ...In this paper,a zero-sum game Nash equilibrium computation problem with a common constraint set is investigated under two time-varying multi-agent subnetworks,where the two subnetworks have opposite payoff function.A novel distributed projection subgradient algorithm with random sleep scheme is developed to reduce the calculation amount of agents in the process of computing Nash equilibrium.In our algorithm,each agent is determined by an independent identically distributed Bernoulli decision to compute the subgradient and perform the projection operation or to keep the previous consensus estimate,it effectively reduces the amount of computation and calculation time.Moreover,the traditional assumption of stepsize adopted in the existing methods is removed,and the stepsizes in our algorithm are randomized diminishing.Besides,we prove that all agents converge to Nash equilibrium with probability 1 by our algorithm.Finally,a simulation example verifies the validity of our algorithm.展开更多
In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pe...In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pessimistic Pareto optimal solution concept by assuming that a player supposes the opponent adopts the most disadvantage strategy for the self. It is shown that any pessimistic Pareto optimal solution can be obtained on the basis of linear programming techniques even if the membership functions for the objective functions are nonlinear. Moreover, we propose interactive algorithms based on the bisection method to obtain a pessimistic compromise solution from among the set of all pessimistic Pareto optimal solutions. In order to show the efficiency of the proposed method, we illustrate interactive processes of an application to a vegetable shipment problem.展开更多
Nowadays,China is the largest developing country in the world,and the US is the largest developed country in the world.Sino-US economic and trade relations are of great significance to the two nations and may have apr...Nowadays,China is the largest developing country in the world,and the US is the largest developed country in the world.Sino-US economic and trade relations are of great significance to the two nations and may have aprominent impact on the stability and development of the global economy.展开更多
There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum ga...There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum game, and prove that it is theoretically a polynomial time algorithm. We implement the Karmarkar method, and a preliminary computational result shows that it performs well for zero-sum games. We also mention an affine scaling method that would help us compute Nash equilibria of general zero-sum games effectively.展开更多
This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics.On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games,...This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics.On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games,where the system dynamics is not required.By analyzing the value function iterations,the convergence of the model-based algorithm is shown.The equivalence of several types of value iteration algorithms is established.The effectiveness of model-free algorithms is demonstrated by a numerical example.展开更多
In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of th...In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of the iterative value function sequence is accelerated significantly with the relaxation factor.Second,the convergence and monotonicity of the value function sequence are analyzed under different ranges of the relaxation factor.Third,two practical approaches,namely the integrated scheme and the relaxation function,are introduced into the accelerated VI algorithm to guarantee the convergence of the iterative value function sequence for zero-sum games.The integrated scheme consists of the accelerated stage and the convergence stage,and the relaxation function can adjust the value of the relaxation factor.Finally,including the autopilot controller,the fantastic performance of the accelerated VI algorithm is verified through two examples with practical physical backgrounds.展开更多
A neurodynamic method(NdM)for convex optimization is proposed in this paper with an equality constraint.The method utilizes a neurodynamic system(NdS)that converges to the optimal solution of a convex optimization pro...A neurodynamic method(NdM)for convex optimization is proposed in this paper with an equality constraint.The method utilizes a neurodynamic system(NdS)that converges to the optimal solution of a convex optimization problem in a fixed time.Due to its mathematical simplicity,it can also be combined with reinforcement learning(RL)to solve a class of nonconvex optimization problems.To maintain the mathematical simplicity of NdS,zero-sum initial constraints are introduced to reduce the number of auxiliary multipliers.First,the initial sum of the state variables must satisfy the equality constraint.Second,the sum of their derivatives is designed to remain zero.In order to apply the proposed convex optimization algorithm to nonconvex optimization with mixed constraints,the virtual actions in RL are redefined to avoid the use of NdS inequality constrained multipliers.The proposed NdM plays an effective search tool in constrained nonconvex optimization algorithms.Numerical examples demonstrate the effectiveness of the proposed algorithm.展开更多
In this paper,based on ACP(ACP:artificial societies,computational experiments,and parallel execution)approach,a parallel control method is proposed for zero-sum games of unknown time-varying systems.The process of con...In this paper,based on ACP(ACP:artificial societies,computational experiments,and parallel execution)approach,a parallel control method is proposed for zero-sum games of unknown time-varying systems.The process of constructing a sequence of artificial systems,implementing the computational experiments,and conducting the parallel execution is presented.The artificial systems are constructed to model the real system.Computational experiments adopting adaptive dynamic programming(ADP)are shown to derive control laws for a sequence of artificial systems.The purpose of the parallel execution step is to derive the control laws for the real system.Finally,simulation experiments are provided to show the effectiveness of the proposed method.展开更多
文摘In this paper,a zero-sum game Nash equilibrium computation problem with a common constraint set is investigated under two time-varying multi-agent subnetworks,where the two subnetworks have opposite payoff function.A novel distributed projection subgradient algorithm with random sleep scheme is developed to reduce the calculation amount of agents in the process of computing Nash equilibrium.In our algorithm,each agent is determined by an independent identically distributed Bernoulli decision to compute the subgradient and perform the projection operation or to keep the previous consensus estimate,it effectively reduces the amount of computation and calculation time.Moreover,the traditional assumption of stepsize adopted in the existing methods is removed,and the stepsizes in our algorithm are randomized diminishing.Besides,we prove that all agents converge to Nash equilibrium with probability 1 by our algorithm.Finally,a simulation example verifies the validity of our algorithm.
文摘In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pessimistic Pareto optimal solution concept by assuming that a player supposes the opponent adopts the most disadvantage strategy for the self. It is shown that any pessimistic Pareto optimal solution can be obtained on the basis of linear programming techniques even if the membership functions for the objective functions are nonlinear. Moreover, we propose interactive algorithms based on the bisection method to obtain a pessimistic compromise solution from among the set of all pessimistic Pareto optimal solutions. In order to show the efficiency of the proposed method, we illustrate interactive processes of an application to a vegetable shipment problem.
文摘Nowadays,China is the largest developing country in the world,and the US is the largest developed country in the world.Sino-US economic and trade relations are of great significance to the two nations and may have aprominent impact on the stability and development of the global economy.
文摘There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum game, and prove that it is theoretically a polynomial time algorithm. We implement the Karmarkar method, and a preliminary computational result shows that it performs well for zero-sum games. We also mention an affine scaling method that would help us compute Nash equilibria of general zero-sum games effectively.
基金Supported by National High Technology Research and Development Program of China (863 Program) (2006AA04Z183), National Natural Science Foundation of China (60621001, 60534010, 60572070, 60774048, 60728307), Program for Changjiang Scholars and Innovative Research Groups of China (60728307, 4031002)
基金supported by the National Natural Science Foundation of China under Grant Nos.62122043,62192753,62433020,T2293770Natural Science Foundation of Shandong Province for Distinguished Young Scholars under Grant No.ZR2022JQ31.
文摘This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics.On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games,where the system dynamics is not required.By analyzing the value function iterations,the convergence of the model-based algorithm is shown.The equivalence of several types of value iteration algorithms is established.The effectiveness of model-free algorithms is demonstrated by a numerical example.
基金supported in part by the National Natural Science Foundation of China under Grant 62222301,Grant 61890930-5,and Grant 62021003the National Science and Technology Major Project under Grant 2021ZD0112302 and Grant 2021ZD0112301the Beijing Natural Science Foundation under Grant JQ19013.
文摘In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of the iterative value function sequence is accelerated significantly with the relaxation factor.Second,the convergence and monotonicity of the value function sequence are analyzed under different ranges of the relaxation factor.Third,two practical approaches,namely the integrated scheme and the relaxation function,are introduced into the accelerated VI algorithm to guarantee the convergence of the iterative value function sequence for zero-sum games.The integrated scheme consists of the accelerated stage and the convergence stage,and the relaxation function can adjust the value of the relaxation factor.Finally,including the autopilot controller,the fantastic performance of the accelerated VI algorithm is verified through two examples with practical physical backgrounds.
基金supported by the National Natural Science Foundation of China(Nos.61973070 and 62373089)the Nature Science Foundation of Liaoning Province,China(No.2022JH25/10100008)the SAPI Fundamental Research Funds,China(No.2018ZCX22).
文摘A neurodynamic method(NdM)for convex optimization is proposed in this paper with an equality constraint.The method utilizes a neurodynamic system(NdS)that converges to the optimal solution of a convex optimization problem in a fixed time.Due to its mathematical simplicity,it can also be combined with reinforcement learning(RL)to solve a class of nonconvex optimization problems.To maintain the mathematical simplicity of NdS,zero-sum initial constraints are introduced to reduce the number of auxiliary multipliers.First,the initial sum of the state variables must satisfy the equality constraint.Second,the sum of their derivatives is designed to remain zero.In order to apply the proposed convex optimization algorithm to nonconvex optimization with mixed constraints,the virtual actions in RL are redefined to avoid the use of NdS inequality constrained multipliers.The proposed NdM plays an effective search tool in constrained nonconvex optimization algorithms.Numerical examples demonstrate the effectiveness of the proposed algorithm.
基金supported in part by the National Key R&D Program of China(No.2021YFE0206100)the National Natural Science Foundation of China(Nos.62073321 and 62273036)+2 种基金the National Defense Basic Scientific Research Program(No.JCKY2019203C029)the Science and Technology Development Fund,Macao SAR(Nos.FDCT-22-009-MISE and 0060/2021/A20015/2020/AMJ)the State Key Lab of Rail Traffic Control&Safety(No.RCS2021K005).
文摘In this paper,based on ACP(ACP:artificial societies,computational experiments,and parallel execution)approach,a parallel control method is proposed for zero-sum games of unknown time-varying systems.The process of constructing a sequence of artificial systems,implementing the computational experiments,and conducting the parallel execution is presented.The artificial systems are constructed to model the real system.Computational experiments adopting adaptive dynamic programming(ADP)are shown to derive control laws for a sequence of artificial systems.The purpose of the parallel execution step is to derive the control laws for the real system.Finally,simulation experiments are provided to show the effectiveness of the proposed method.