In recent years,reinforcement learning control theory has been well developed.However,model-free value iteration needs many iterations to achieve the desired precision,and modelfree policy iteration requires an initia...In recent years,reinforcement learning control theory has been well developed.However,model-free value iteration needs many iterations to achieve the desired precision,and modelfree policy iteration requires an initial stabilizing control policy.It is significant to propose a fast model-free algorithm to solve the continuous-time linear quadratic control problem without an initial stabilizing control policy.In this paper,we construct a homotopy path on which each point corresponds to an linear quadratic regulator problem.Based on policy iteration,model-based and model-free homotopy algorithms are proposed to solve the optimal control problem of continuous-time linear systems along the homotopy path.Our algorithms are speeded up using first-order differential information and do not require an initial stabilizing control policy.Finally,several practical examples are used to illustrate our results.展开更多
In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each...In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov's direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems'fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.展开更多
On the occasion of the 50th anniversary of the establishment of diplomatic relations between China and the European Union(EU)and the 10th anniversary of the adoption of the Paris Agreement,the Chinese and EU leaders h...On the occasion of the 50th anniversary of the establishment of diplomatic relations between China and the European Union(EU)and the 10th anniversary of the adoption of the Paris Agreement,the Chinese and EU leaders hereby:Reiterate that in the fluid and turbulent international situation today,it is crucial that all countries,notably the major economies maintain policy continuity and stability and step up efforts to address climate change.展开更多
基金supported by the National Natural Science Foundation of China(62273320).
文摘In recent years,reinforcement learning control theory has been well developed.However,model-free value iteration needs many iterations to achieve the desired precision,and modelfree policy iteration requires an initial stabilizing control policy.It is significant to propose a fast model-free algorithm to solve the continuous-time linear quadratic control problem without an initial stabilizing control policy.In this paper,we construct a homotopy path on which each point corresponds to an linear quadratic regulator problem.Based on policy iteration,model-based and model-free homotopy algorithms are proposed to solve the optimal control problem of continuous-time linear systems along the homotopy path.Our algorithms are speeded up using first-order differential information and do not require an initial stabilizing control policy.Finally,several practical examples are used to illustrate our results.
基金supported in part by the National Science Foundation(Nos.ECCS-2210320,CNS-2148304).
文摘In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov's direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems'fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.
文摘On the occasion of the 50th anniversary of the establishment of diplomatic relations between China and the European Union(EU)and the 10th anniversary of the adoption of the Paris Agreement,the Chinese and EU leaders hereby:Reiterate that in the fluid and turbulent international situation today,it is crucial that all countries,notably the major economies maintain policy continuity and stability and step up efforts to address climate change.