This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics.On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games,...This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics.On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games,where the system dynamics is not required.By analyzing the value function iterations,the convergence of the model-based algorithm is shown.The equivalence of several types of value iteration algorithms is established.The effectiveness of model-free algorithms is demonstrated by a numerical example.展开更多
基金supported by the National Natural Science Foundation of China under Grant Nos.62122043,62192753,62433020,T2293770Natural Science Foundation of Shandong Province for Distinguished Young Scholars under Grant No.ZR2022JQ31.
文摘This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics.On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games,where the system dynamics is not required.By analyzing the value function iterations,the convergence of the model-based algorithm is shown.The equivalence of several types of value iteration algorithms is established.The effectiveness of model-free algorithms is demonstrated by a numerical example.