In repeated zero-sum games,instead of constantly playing an equilibrium strategy of the stage game,learning to exploit the opponent given historical interactions could typically obtain a higher utility.However,when pl...In repeated zero-sum games,instead of constantly playing an equilibrium strategy of the stage game,learning to exploit the opponent given historical interactions could typically obtain a higher utility.However,when playing against a fully adaptive opponent,one would have dificulty identifying the opponent's adaptive dynamics and further exploiting its potential weakness.In this paper,we study the problem of optimizing against the adaptive opponent who uses no-regret learning.No-regret learning is a classic and widely-used branch of adaptive learning algorithms.We propose a general framework for online modeling no-regret opponents and exploiting their weakness.With this framework,one could approximate the opponent's no-regret learning dynamics and then develop a response plan to obtain a significant profit based on the inferences of the opponent's strategies.We employ two system identification architectures,including the recurrent neural network(RNN)and the nonlinear autoregressive exogenous model,and adopt an efficient greedy response plan within the framework.Theoretically,we prove the approximation capability of our RNN architecture at approximating specific no-regret dynamics.Empirically,we demonstrate that during interactions at a low level of non-stationarity,our architectures could approximate the dynamics with a low error,and the derived policies could exploit the no-regret opponent to obtain a decent utility.展开更多
In this paper,a zero-sum game Nash equilibrium computation problem with a common constraint set is investigated under two time-varying multi-agent subnetworks,where the two subnetworks have opposite payoff function.A ...In this paper,a zero-sum game Nash equilibrium computation problem with a common constraint set is investigated under two time-varying multi-agent subnetworks,where the two subnetworks have opposite payoff function.A novel distributed projection subgradient algorithm with random sleep scheme is developed to reduce the calculation amount of agents in the process of computing Nash equilibrium.In our algorithm,each agent is determined by an independent identically distributed Bernoulli decision to compute the subgradient and perform the projection operation or to keep the previous consensus estimate,it effectively reduces the amount of computation and calculation time.Moreover,the traditional assumption of stepsize adopted in the existing methods is removed,and the stepsizes in our algorithm are randomized diminishing.Besides,we prove that all agents converge to Nash equilibrium with probability 1 by our algorithm.Finally,a simulation example verifies the validity of our algorithm.展开更多
In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pe...In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pessimistic Pareto optimal solution concept by assuming that a player supposes the opponent adopts the most disadvantage strategy for the self. It is shown that any pessimistic Pareto optimal solution can be obtained on the basis of linear programming techniques even if the membership functions for the objective functions are nonlinear. Moreover, we propose interactive algorithms based on the bisection method to obtain a pessimistic compromise solution from among the set of all pessimistic Pareto optimal solutions. In order to show the efficiency of the proposed method, we illustrate interactive processes of an application to a vegetable shipment problem.展开更多
Nowadays,China is the largest developing country in the world,and the US is the largest developed country in the world.Sino-US economic and trade relations are of great significance to the two nations and may have apr...Nowadays,China is the largest developing country in the world,and the US is the largest developed country in the world.Sino-US economic and trade relations are of great significance to the two nations and may have aprominent impact on the stability and development of the global economy.展开更多
There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum ga...There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum game, and prove that it is theoretically a polynomial time algorithm. We implement the Karmarkar method, and a preliminary computational result shows that it performs well for zero-sum games. We also mention an affine scaling method that would help us compute Nash equilibria of general zero-sum games effectively.展开更多
To keep the secrecy performance from being badly influenced by untrusted relay(UR), a multi-UR network through amplify-and-forward(AF) cooperative scheme is put forward, which takes relay weight and harmful factor int...To keep the secrecy performance from being badly influenced by untrusted relay(UR), a multi-UR network through amplify-and-forward(AF) cooperative scheme is put forward, which takes relay weight and harmful factor into account. A nonzero-sum game is established to capture the interaction among URs and detection strategies. Secrecy capacity is investigated as game payoff to indicate the untrusted behaviors of the relays. The maximum probabilities of the behaviors of relay and the optimal system detection strategy can be obtained by using the proposed algorithm.展开更多
Non-orthogonal multiple access technology(NOMA),as a potentially promising technology in the 5G/B5G era,suffers fromubiquitous security threats due to the broadcast nature of the wirelessmedium.In this paper,we focus ...Non-orthogonal multiple access technology(NOMA),as a potentially promising technology in the 5G/B5G era,suffers fromubiquitous security threats due to the broadcast nature of the wirelessmedium.In this paper,we focus on artificial-signal-assisted and relay-assisted secure downlink transmission schemes against external eavesdropping in the context of physical layer security,respectively.To characterize the non-cooperative confrontation around the secrecy rate between the legitimate communication party and the eavesdropper,their interactions are modeled as a two-person zero-sum game.The existence of the Nash equilibrium of the proposed game models is proved,and the pure strategyNash equilibriumand mixed-strategyNash equilibriumprofiles in the two schemes are solved and analyzed,respectively.The numerical simulations are conducted to validate the analytical results,and showthat the two schemes improve the secrecy rate and further enhance the physical layer security performance of NOMA systems.展开更多
This paper investigates the multi-player non-zero-sum game problem for unknown linear continuous-time systems with unmeasurable states.By only accessing the data information of input and output,a data-driven learning ...This paper investigates the multi-player non-zero-sum game problem for unknown linear continuous-time systems with unmeasurable states.By only accessing the data information of input and output,a data-driven learning control approach is proposed to estimate N-tuple dynamic output feedback control policies which can form Nash equilibrium solution to the multi-player non-zero-sum game problem.In particular,the explicit form of dynamic output feedback Nash strategy is constructed by embedding the internal dynamics and solving coupled algebraic Riccati equations.The coupled policy-iteration based iterative learning equations are established to estimate the N-tuple feedback control gains without prior knowledge of system matrices.Finally,an example is used to illustrate the effectiveness of the proposed approach.展开更多
Dear Editor,This letter addresses the impulse game problem for a general scope of deterministic,multi-player,nonzero-sum differential games wherein all participants adopt impulse controls.Our objective is to formulate...Dear Editor,This letter addresses the impulse game problem for a general scope of deterministic,multi-player,nonzero-sum differential games wherein all participants adopt impulse controls.Our objective is to formulate this impulse game problem with the modified objective function including interaction costs among the players in a discontinuous fashion,and subsequently,to derive a verification theorem for identifying the feedback Nash equilibrium strategy.展开更多
Building heating,ventilating,and air conditioning(HVAC)systems have one of the largest energy footprint worldwide,which necessitates the design of intelligent control algorithms that improve the energy utilization whi...Building heating,ventilating,and air conditioning(HVAC)systems have one of the largest energy footprint worldwide,which necessitates the design of intelligent control algorithms that improve the energy utilization while still providing thermal comfort.In this work,the authors formulate the HVAC equipment dynamics in the setting of a two-player non-zero-sum cooperative game,which enables two decision variables(mass flow rate and supply air temperature)to perform joint optimization of the control utilization and thermal setpoint tracking by simultaneously exchanging their policies.The HVAC zone serves as a game environment for these two decision variables that act as two players in a game.It is assumed that dynamic models of HVAC equipment are not available.Furthermore,neither the state nor any estimates of HVAC disturbance(heat gains,outside variations,etc.)are accessible,but only the measurement of the zone temperature is available for feedback.Under these constraints,the authors develop a new data-driven Q-learning scheme employing policy iteration and value iteration with a bias compensation mechanism that accounts for unmeasurable disturbances and circumvents the need of full-state measurement.The proposed algorithms are shown to converge to the optimal solution corresponding to the generalized algebraic Riccati equations(GAREs)in dynamic games.展开更多
This paper attempts to study two-person nonzero-sum games for denumerable continuous-time Markov chains determined by transition rates,with an expected average criterion.The transition rates are allowed to be unbounde...This paper attempts to study two-person nonzero-sum games for denumerable continuous-time Markov chains determined by transition rates,with an expected average criterion.The transition rates are allowed to be unbounded,and the payoff functions may be unbounded from above and from below.We give suitable conditions under which the existence of a Nash equilibrium is ensured.More precisely,using the socalled "vanishing discount" approach,a Nash equilibrium for the average criterion is obtained as a limit point of a sequence of equilibrium strategies for the discounted criterion as the discount factors tend to zero.Our results are illustrated with a birth-and-death game.展开更多
In this paper,a zero-sum game Nash equilibrium computation problem with event-triggered communication is investigated under an undirected weight-balanced multi-agent network.A novel distributed event-triggered project...In this paper,a zero-sum game Nash equilibrium computation problem with event-triggered communication is investigated under an undirected weight-balanced multi-agent network.A novel distributed event-triggered projection subgradient algorithm is developed to reduce the communication burden within the subnetworks.In the proposed algorithm,when the difference between the current state of the agent and the state of the last trigger time exceeds a given threshold,the agent will be triggered to communicate with its neighbours.Moreover,we prove that all agents converge to Nash equilibrium by the proposed algorithm.Finally,two simulation examples verify that our algorithm not only reduces the communication burden but also ensures that the convergence speed and accuracy are close to that of the time-triggered method under the appropriate threshold.展开更多
This paper presents a novel optimal synchronization control method for multi-agent systems with input saturation.The multi-agent game theory is introduced to transform the optimal synchronization control problem into ...This paper presents a novel optimal synchronization control method for multi-agent systems with input saturation.The multi-agent game theory is introduced to transform the optimal synchronization control problem into a multi-agent nonzero-sum game.Then,the Nash equilibrium can be achieved by solving the coupled Hamilton–Jacobi–Bellman(HJB)equations with nonquadratic input energy terms.A novel off-policy reinforcement learning method is presented to obtain the Nash equilibrium solution without the system models,and the critic neural networks(NNs)and actor NNs are introduced to implement the presented method.Theoretical analysis is provided,which shows that the iterative control laws converge to the Nash equilibrium.Simulation results show the good performance of the presented method.展开更多
The existence and uniqueness of the solutions for one kind of forward-backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions.Th...The existence and uniqueness of the solutions for one kind of forward-backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions.Then these results were applied to nonzero-sum differential games with random jumps to get the explicit form of the open-loop Nash equilibrium point by the solution of the forward-backward stochastic differential equations.展开更多
In this paper,based on ACP(ACP:artificial societies,computational experiments,and parallel execution)approach,a parallel control method is proposed for zero-sum games of unknown time-varying systems.The process of con...In this paper,based on ACP(ACP:artificial societies,computational experiments,and parallel execution)approach,a parallel control method is proposed for zero-sum games of unknown time-varying systems.The process of constructing a sequence of artificial systems,implementing the computational experiments,and conducting the parallel execution is presented.The artificial systems are constructed to model the real system.Computational experiments adopting adaptive dynamic programming(ADP)are shown to derive control laws for a sequence of artificial systems.The purpose of the parallel execution step is to derive the control laws for the real system.Finally,simulation experiments are provided to show the effectiveness of the proposed method.展开更多
In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of th...In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of the iterative value function sequence is accelerated significantly with the relaxation factor.Second,the convergence and monotonicity of the value function sequence are analyzed under different ranges of the relaxation factor.Third,two practical approaches,namely the integrated scheme and the relaxation function,are introduced into the accelerated VI algorithm to guarantee the convergence of the iterative value function sequence for zero-sum games.The integrated scheme consists of the accelerated stage and the convergence stage,and the relaxation function can adjust the value of the relaxation factor.Finally,including the autopilot controller,the fantastic performance of the accelerated VI algorithm is verified through two examples with practical physical backgrounds.展开更多
The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. ...The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. Then these results were applied to nonzero-sum differential games with random jumps to get the explicit form of the open-loop Nash equilibrium point by the solution of the forward-backward stochastic differential equations.展开更多
This paper studies a class of continuous-time two person zero-sum stochastic differential games characterized by linear It?’s differential equation with state-dependent noise and Markovian parameter jumps. Under the ...This paper studies a class of continuous-time two person zero-sum stochastic differential games characterized by linear It?’s differential equation with state-dependent noise and Markovian parameter jumps. Under the assumption of stochastic stabilizability, necessary and sufficient condition for the existence of the optimal control strategies is presented by means of a system of coupled algebraic Riccati equations via using the stochastic optimal control theory. Furthermore, the stochastic H∞ control problem for stochastic systems with Markovian jumps is discussed as an immediate application, and meanwhile, an illustrative example is presented.展开更多
When the maneuverability of a pursuer is not significantly higher than that of an evader,it will be difficult to intercept the evader with only one pursuer.Therefore,this article adopts a two-to-one differential game ...When the maneuverability of a pursuer is not significantly higher than that of an evader,it will be difficult to intercept the evader with only one pursuer.Therefore,this article adopts a two-to-one differential game strategy,the game of kind is generally considered to be angle-optimized,which allows unlimited turns,but these practices do not take into account the effect of acceleration,which does not correspond to the actual situation,thus,based on the angle-optimized,the acceleration optimization and the acceleration upper bound constraint are added into the game for consideration.A two-to-one differential game problem is proposed in the three-dimensional space,and an improved multi-objective grey wolf optimization(IMOGWO)algorithm is proposed to solve the optimal game point of this problem.With the equations that describe the relative motions between the pursuers and the evader in the three-dimensional space,a multi-objective function with constraints is given as the performance index to design an optimal strategy for the differential game.Then the optimal game point is solved by using the IMOGWO algorithm.It is proved based on Markov chains that with the IMOGWO,the Pareto solution set is the solution of the differential game.Finally,it is verified through simulations that the pursuers can capture the escapee,and via comparative experiments,it is shown that the IMOGWO algorithm performs well in terms of running time and memory usage.展开更多
基金the Science and Technology Innovation 2030-"New Generation Artificial Intelligence"Major Project(No.2018AAA0100901)。
文摘In repeated zero-sum games,instead of constantly playing an equilibrium strategy of the stage game,learning to exploit the opponent given historical interactions could typically obtain a higher utility.However,when playing against a fully adaptive opponent,one would have dificulty identifying the opponent's adaptive dynamics and further exploiting its potential weakness.In this paper,we study the problem of optimizing against the adaptive opponent who uses no-regret learning.No-regret learning is a classic and widely-used branch of adaptive learning algorithms.We propose a general framework for online modeling no-regret opponents and exploiting their weakness.With this framework,one could approximate the opponent's no-regret learning dynamics and then develop a response plan to obtain a significant profit based on the inferences of the opponent's strategies.We employ two system identification architectures,including the recurrent neural network(RNN)and the nonlinear autoregressive exogenous model,and adopt an efficient greedy response plan within the framework.Theoretically,we prove the approximation capability of our RNN architecture at approximating specific no-regret dynamics.Empirically,we demonstrate that during interactions at a low level of non-stationarity,our architectures could approximate the dynamics with a low error,and the derived policies could exploit the no-regret opponent to obtain a decent utility.
文摘In this paper,a zero-sum game Nash equilibrium computation problem with a common constraint set is investigated under two time-varying multi-agent subnetworks,where the two subnetworks have opposite payoff function.A novel distributed projection subgradient algorithm with random sleep scheme is developed to reduce the calculation amount of agents in the process of computing Nash equilibrium.In our algorithm,each agent is determined by an independent identically distributed Bernoulli decision to compute the subgradient and perform the projection operation or to keep the previous consensus estimate,it effectively reduces the amount of computation and calculation time.Moreover,the traditional assumption of stepsize adopted in the existing methods is removed,and the stepsizes in our algorithm are randomized diminishing.Besides,we prove that all agents converge to Nash equilibrium with probability 1 by our algorithm.Finally,a simulation example verifies the validity of our algorithm.
文摘In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pessimistic Pareto optimal solution concept by assuming that a player supposes the opponent adopts the most disadvantage strategy for the self. It is shown that any pessimistic Pareto optimal solution can be obtained on the basis of linear programming techniques even if the membership functions for the objective functions are nonlinear. Moreover, we propose interactive algorithms based on the bisection method to obtain a pessimistic compromise solution from among the set of all pessimistic Pareto optimal solutions. In order to show the efficiency of the proposed method, we illustrate interactive processes of an application to a vegetable shipment problem.
文摘Nowadays,China is the largest developing country in the world,and the US is the largest developed country in the world.Sino-US economic and trade relations are of great significance to the two nations and may have aprominent impact on the stability and development of the global economy.
文摘There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum game, and prove that it is theoretically a polynomial time algorithm. We implement the Karmarkar method, and a preliminary computational result shows that it performs well for zero-sum games. We also mention an affine scaling method that would help us compute Nash equilibria of general zero-sum games effectively.
基金Supported by National High Technology Research and Development Program of China (863 Program) (2006AA04Z183), National Natural Science Foundation of China (60621001, 60534010, 60572070, 60774048, 60728307), Program for Changjiang Scholars and Innovative Research Groups of China (60728307, 4031002)
基金Supported by the National Natural Science Foundation of China(No.61101223)
文摘To keep the secrecy performance from being badly influenced by untrusted relay(UR), a multi-UR network through amplify-and-forward(AF) cooperative scheme is put forward, which takes relay weight and harmful factor into account. A nonzero-sum game is established to capture the interaction among URs and detection strategies. Secrecy capacity is investigated as game payoff to indicate the untrusted behaviors of the relays. The maximum probabilities of the behaviors of relay and the optimal system detection strategy can be obtained by using the proposed algorithm.
基金supported by the NationalNatural Science Foundation of China under Grants U1836104,61801073,61931004,62072250National Key Research and Development Program of China under Grant 2021QY0700The Startup Foundation for Introducing Talent of NUIST under Grant 2021r039.
文摘Non-orthogonal multiple access technology(NOMA),as a potentially promising technology in the 5G/B5G era,suffers fromubiquitous security threats due to the broadcast nature of the wirelessmedium.In this paper,we focus on artificial-signal-assisted and relay-assisted secure downlink transmission schemes against external eavesdropping in the context of physical layer security,respectively.To characterize the non-cooperative confrontation around the secrecy rate between the legitimate communication party and the eavesdropper,their interactions are modeled as a two-person zero-sum game.The existence of the Nash equilibrium of the proposed game models is proved,and the pure strategyNash equilibriumand mixed-strategyNash equilibriumprofiles in the two schemes are solved and analyzed,respectively.The numerical simulations are conducted to validate the analytical results,and showthat the two schemes improve the secrecy rate and further enhance the physical layer security performance of NOMA systems.
基金supported by National Key R&D Program of China under Grant No.2021ZD0112600the National Natural Science Foundation of China under Grant No.62373058+3 种基金the Beijing Natural Science Foundation under Grant No.L233003National Science Fund for Distinguished Young Scholars of China under Grant No.62025301the Postdoctoral Fellowship Program of CPSF under Grant No.GZC20233407the Basic Science Center Programs of NSFC under Grant No.62088101。
文摘This paper investigates the multi-player non-zero-sum game problem for unknown linear continuous-time systems with unmeasurable states.By only accessing the data information of input and output,a data-driven learning control approach is proposed to estimate N-tuple dynamic output feedback control policies which can form Nash equilibrium solution to the multi-player non-zero-sum game problem.In particular,the explicit form of dynamic output feedback Nash strategy is constructed by embedding the internal dynamics and solving coupled algebraic Riccati equations.The coupled policy-iteration based iterative learning equations are established to estimate the N-tuple feedback control gains without prior knowledge of system matrices.Finally,an example is used to illustrate the effectiveness of the proposed approach.
基金supported in part by the National Natural Science Foundation of China(62173051)the Fundamental Research Funds for the Central Universities(2024CDJCGJ012,2023CDJXY-010)+1 种基金the Chongqing Technology Innovation and Application Development Special Key Project(CSTB2022TIADCUX0015,CSTB2022TIAD-KPX0162)the China Postdoctoral Science Foundation(2024M763865)
文摘Dear Editor,This letter addresses the impulse game problem for a general scope of deterministic,multi-player,nonzero-sum differential games wherein all participants adopt impulse controls.Our objective is to formulate this impulse game problem with the modified objective function including interaction costs among the players in a discontinuous fashion,and subsequently,to derive a verification theorem for identifying the feedback Nash equilibrium strategy.
文摘Building heating,ventilating,and air conditioning(HVAC)systems have one of the largest energy footprint worldwide,which necessitates the design of intelligent control algorithms that improve the energy utilization while still providing thermal comfort.In this work,the authors formulate the HVAC equipment dynamics in the setting of a two-player non-zero-sum cooperative game,which enables two decision variables(mass flow rate and supply air temperature)to perform joint optimization of the control utilization and thermal setpoint tracking by simultaneously exchanging their policies.The HVAC zone serves as a game environment for these two decision variables that act as two players in a game.It is assumed that dynamic models of HVAC equipment are not available.Furthermore,neither the state nor any estimates of HVAC disturbance(heat gains,outside variations,etc.)are accessible,but only the measurement of the zone temperature is available for feedback.Under these constraints,the authors develop a new data-driven Q-learning scheme employing policy iteration and value iteration with a bias compensation mechanism that accounts for unmeasurable disturbances and circumvents the need of full-state measurement.The proposed algorithms are shown to converge to the optimal solution corresponding to the generalized algebraic Riccati equations(GAREs)in dynamic games.
基金supported by National Science Foundation for Distinguished Young Scholars of China (Grant No. 10925107)Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2011)
文摘This paper attempts to study two-person nonzero-sum games for denumerable continuous-time Markov chains determined by transition rates,with an expected average criterion.The transition rates are allowed to be unbounded,and the payoff functions may be unbounded from above and from below.We give suitable conditions under which the existence of a Nash equilibrium is ensured.More precisely,using the socalled "vanishing discount" approach,a Nash equilibrium for the average criterion is obtained as a limit point of a sequence of equilibrium strategies for the discounted criterion as the discount factors tend to zero.Our results are illustrated with a birth-and-death game.
文摘In this paper,a zero-sum game Nash equilibrium computation problem with event-triggered communication is investigated under an undirected weight-balanced multi-agent network.A novel distributed event-triggered projection subgradient algorithm is developed to reduce the communication burden within the subnetworks.In the proposed algorithm,when the difference between the current state of the agent and the state of the last trigger time exceeds a given threshold,the agent will be triggered to communicate with its neighbours.Moreover,we prove that all agents converge to Nash equilibrium by the proposed algorithm.Finally,two simulation examples verify that our algorithm not only reduces the communication burden but also ensures that the convergence speed and accuracy are close to that of the time-triggered method under the appropriate threshold.
基金Project supported by the National Key R&D Program of China(No.2018YFB1702300)the National Natural Science Foundation of China(Nos.61722312 and 61533017)。
文摘This paper presents a novel optimal synchronization control method for multi-agent systems with input saturation.The multi-agent game theory is introduced to transform the optimal synchronization control problem into a multi-agent nonzero-sum game.Then,the Nash equilibrium can be achieved by solving the coupled Hamilton–Jacobi–Bellman(HJB)equations with nonquadratic input energy terms.A novel off-policy reinforcement learning method is presented to obtain the Nash equilibrium solution without the system models,and the critic neural networks(NNs)and actor NNs are introduced to implement the presented method.Theoretical analysis is provided,which shows that the iterative control laws converge to the Nash equilibrium.Simulation results show the good performance of the presented method.
基金Project supported by the National Natural Science Foundation of China (No.10371067) thePlanned Item for the Outstanding Young Teachers of Ministry of Education of China (No.2057) the Special Fund for Ph.D. Program of Ministry of Education of China ( No.20020422020) and the Fok Ying Tung Education Foundation for Young College Teachers(No.91064)
文摘The existence and uniqueness of the solutions for one kind of forward-backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions.Then these results were applied to nonzero-sum differential games with random jumps to get the explicit form of the open-loop Nash equilibrium point by the solution of the forward-backward stochastic differential equations.
基金supported in part by the National Key R&D Program of China(No.2021YFE0206100)the National Natural Science Foundation of China(Nos.62073321 and 62273036)+2 种基金the National Defense Basic Scientific Research Program(No.JCKY2019203C029)the Science and Technology Development Fund,Macao SAR(Nos.FDCT-22-009-MISE and 0060/2021/A20015/2020/AMJ)the State Key Lab of Rail Traffic Control&Safety(No.RCS2021K005).
文摘In this paper,based on ACP(ACP:artificial societies,computational experiments,and parallel execution)approach,a parallel control method is proposed for zero-sum games of unknown time-varying systems.The process of constructing a sequence of artificial systems,implementing the computational experiments,and conducting the parallel execution is presented.The artificial systems are constructed to model the real system.Computational experiments adopting adaptive dynamic programming(ADP)are shown to derive control laws for a sequence of artificial systems.The purpose of the parallel execution step is to derive the control laws for the real system.Finally,simulation experiments are provided to show the effectiveness of the proposed method.
基金supported in part by the National Natural Science Foundation of China under Grant 62222301,Grant 61890930-5,and Grant 62021003the National Science and Technology Major Project under Grant 2021ZD0112302 and Grant 2021ZD0112301the Beijing Natural Science Foundation under Grant JQ19013.
文摘In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of the iterative value function sequence is accelerated significantly with the relaxation factor.Second,the convergence and monotonicity of the value function sequence are analyzed under different ranges of the relaxation factor.Third,two practical approaches,namely the integrated scheme and the relaxation function,are introduced into the accelerated VI algorithm to guarantee the convergence of the iterative value function sequence for zero-sum games.The integrated scheme consists of the accelerated stage and the convergence stage,and the relaxation function can adjust the value of the relaxation factor.Finally,including the autopilot controller,the fantastic performance of the accelerated VI algorithm is verified through two examples with practical physical backgrounds.
基金国家自然科学基金,Outstanding Young Teachers of Ministry of Education of China,Special Fund for Ph.D.Program of Ministry of Education of China,Fok Ying Tung Education Foundation
文摘The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. Then these results were applied to nonzero-sum differential games with random jumps to get the explicit form of the open-loop Nash equilibrium point by the solution of the forward-backward stochastic differential equations.
文摘This paper studies a class of continuous-time two person zero-sum stochastic differential games characterized by linear It?’s differential equation with state-dependent noise and Markovian parameter jumps. Under the assumption of stochastic stabilizability, necessary and sufficient condition for the existence of the optimal control strategies is presented by means of a system of coupled algebraic Riccati equations via using the stochastic optimal control theory. Furthermore, the stochastic H∞ control problem for stochastic systems with Markovian jumps is discussed as an immediate application, and meanwhile, an illustrative example is presented.
基金National Natural Science Foundation of China(NSFC61773142,NSFC62303136)。
文摘When the maneuverability of a pursuer is not significantly higher than that of an evader,it will be difficult to intercept the evader with only one pursuer.Therefore,this article adopts a two-to-one differential game strategy,the game of kind is generally considered to be angle-optimized,which allows unlimited turns,but these practices do not take into account the effect of acceleration,which does not correspond to the actual situation,thus,based on the angle-optimized,the acceleration optimization and the acceleration upper bound constraint are added into the game for consideration.A two-to-one differential game problem is proposed in the three-dimensional space,and an improved multi-objective grey wolf optimization(IMOGWO)algorithm is proposed to solve the optimal game point of this problem.With the equations that describe the relative motions between the pursuers and the evader in the three-dimensional space,a multi-objective function with constraints is given as the performance index to design an optimal strategy for the differential game.Then the optimal game point is solved by using the IMOGWO algorithm.It is proved based on Markov chains that with the IMOGWO,the Pareto solution set is the solution of the differential game.Finally,it is verified through simulations that the pursuers can capture the escapee,and via comparative experiments,it is shown that the IMOGWO algorithm performs well in terms of running time and memory usage.