Bolt assembly by robots is a vital and difficult task for replacing astronauts in extravehicular activities(EVA),but the trajectory efficiency still needs to be improved during the wrench insertion into hex hole of bo...Bolt assembly by robots is a vital and difficult task for replacing astronauts in extravehicular activities(EVA),but the trajectory efficiency still needs to be improved during the wrench insertion into hex hole of bolt.In this paper,a policy iteration method based on reinforcement learning(RL)is proposed,by which the problem of trajectory efficiency improvement is constructed as an issue of RL-based objective optimization.Firstly,the projection relation between raw data and state-action space is established,and then a policy iteration initialization method is designed based on the projection to provide the initialization policy for iteration.Policy iteration based on the protective policy is applied to continuously evaluating and optimizing the action-value function of all state-action pairs till the convergence is obtained.To verify the feasibility and effectiveness of the proposed method,a noncontact demonstration experiment with human supervision is performed.Experimental results show that the initialization policy and the generated policy can be obtained by the policy iteration method in a limited number of demonstrations.A comparison between the experiments with two different assembly tolerances shows that the convergent generated policy possesses higher trajectory efficiency than the conservative one.In addition,this method can ensure safety during the training process and improve utilization efficiency of demonstration data.展开更多
INTRODUCTION School refusal behaviour(SRB)has become an increasingly significant global issue.In China,the mental health of children and adolescents has emerged as a major soci-etal concern,prompting the development o...INTRODUCTION School refusal behaviour(SRB)has become an increasingly significant global issue.In China,the mental health of children and adolescents has emerged as a major soci-etal concern,prompting the development of related policy initiatives.The Chinese Ministry of Education,in collaboration with 17 other government departments,has issued a series of special action plans to address the mental health problems among Chinese children.展开更多
In recent years,reinforcement learning control theory has been well developed.However,model-free value iteration needs many iterations to achieve the desired precision,and modelfree policy iteration requires an initia...In recent years,reinforcement learning control theory has been well developed.However,model-free value iteration needs many iterations to achieve the desired precision,and modelfree policy iteration requires an initial stabilizing control policy.It is significant to propose a fast model-free algorithm to solve the continuous-time linear quadratic control problem without an initial stabilizing control policy.In this paper,we construct a homotopy path on which each point corresponds to an linear quadratic regulator problem.Based on policy iteration,model-based and model-free homotopy algorithms are proposed to solve the optimal control problem of continuous-time linear systems along the homotopy path.Our algorithms are speeded up using first-order differential information and do not require an initial stabilizing control policy.Finally,several practical examples are used to illustrate our results.展开更多
Policies and initiatives promoting carbon neutrality in the Nordic heating and transport systems are presented. The focus within heating systems is the promotion of HPs (heat pumps) while the focus within transport ...Policies and initiatives promoting carbon neutrality in the Nordic heating and transport systems are presented. The focus within heating systems is the promotion of HPs (heat pumps) while the focus within transport systems is initiatives regarding EVs (electric vehicles). It is found that the conversion to HPs in the Nordic region relies on both private economic and national economic incentives. Initiatives toward carbon neutrality in the transport system are mostly concentrated on research, development and demonstration for deployment of a large number of EVs. All Nordic countries have plans for the future heating and transport systems with the ambition of realizing carbon neutrality.展开更多
In this paper,a novel hybrid event-triggered control(ETC)method is developed based on the online action-critic technique,which aims at tackling the optimal regulation problem of discrete-time nonlinear systems.In orde...In this paper,a novel hybrid event-triggered control(ETC)method is developed based on the online action-critic technique,which aims at tackling the optimal regulation problem of discrete-time nonlinear systems.In order to ensure the normal execution of the online learning algorithm,a stability criterion condition is created to obtain the initial admissible control policy by using an offline iterative method under the time-triggered control framework.Subsequently,a general triggering condition is designed based on the uniform ultimate boundedness of the controlled system.In order to determine a constant interval which can ensure the system stability,another triggering condition is introduced and the asymptotic stability of the closed-loop system satisfying this condition is analyzed from the perspective of the input-to-state stability.The designed online hybrid ETC method not only further improves control efficiency,but also avoids the continuous judgment of the corresponding triggering condition.In addition,the event-based control law can approach the optimal control input within a finite approximation error.Finally,two experimental examples with physical background are conducted to indicate the present results.展开更多
基金supported by the National Natural Science Foundation of China(No.91848202)the Special Foundation(Pre-Station)of China Postdoctoral Science(No.2021TQ0089)。
文摘Bolt assembly by robots is a vital and difficult task for replacing astronauts in extravehicular activities(EVA),but the trajectory efficiency still needs to be improved during the wrench insertion into hex hole of bolt.In this paper,a policy iteration method based on reinforcement learning(RL)is proposed,by which the problem of trajectory efficiency improvement is constructed as an issue of RL-based objective optimization.Firstly,the projection relation between raw data and state-action space is established,and then a policy iteration initialization method is designed based on the projection to provide the initialization policy for iteration.Policy iteration based on the protective policy is applied to continuously evaluating and optimizing the action-value function of all state-action pairs till the convergence is obtained.To verify the feasibility and effectiveness of the proposed method,a noncontact demonstration experiment with human supervision is performed.Experimental results show that the initialization policy and the generated policy can be obtained by the policy iteration method in a limited number of demonstrations.A comparison between the experiments with two different assembly tolerances shows that the convergent generated policy possesses higher trajectory efficiency than the conservative one.In addition,this method can ensure safety during the training process and improve utilization efficiency of demonstration data.
基金funded by the National Social Science Fund of China(No.22CSH085).
文摘INTRODUCTION School refusal behaviour(SRB)has become an increasingly significant global issue.In China,the mental health of children and adolescents has emerged as a major soci-etal concern,prompting the development of related policy initiatives.The Chinese Ministry of Education,in collaboration with 17 other government departments,has issued a series of special action plans to address the mental health problems among Chinese children.
基金supported by the National Natural Science Foundation of China(62273320).
文摘In recent years,reinforcement learning control theory has been well developed.However,model-free value iteration needs many iterations to achieve the desired precision,and modelfree policy iteration requires an initial stabilizing control policy.It is significant to propose a fast model-free algorithm to solve the continuous-time linear quadratic control problem without an initial stabilizing control policy.In this paper,we construct a homotopy path on which each point corresponds to an linear quadratic regulator problem.Based on policy iteration,model-based and model-free homotopy algorithms are proposed to solve the optimal control problem of continuous-time linear systems along the homotopy path.Our algorithms are speeded up using first-order differential information and do not require an initial stabilizing control policy.Finally,several practical examples are used to illustrate our results.
文摘Policies and initiatives promoting carbon neutrality in the Nordic heating and transport systems are presented. The focus within heating systems is the promotion of HPs (heat pumps) while the focus within transport systems is initiatives regarding EVs (electric vehicles). It is found that the conversion to HPs in the Nordic region relies on both private economic and national economic incentives. Initiatives toward carbon neutrality in the transport system are mostly concentrated on research, development and demonstration for deployment of a large number of EVs. All Nordic countries have plans for the future heating and transport systems with the ambition of realizing carbon neutrality.
基金supported in part by the National Natural Science Foundation of China(62222301,62473012,62021003)the National Science and Technology Major Project(2021ZD0112302,2021ZD0112301)the Beijing Natural Science Foundation(F251019)。
文摘In this paper,a novel hybrid event-triggered control(ETC)method is developed based on the online action-critic technique,which aims at tackling the optimal regulation problem of discrete-time nonlinear systems.In order to ensure the normal execution of the online learning algorithm,a stability criterion condition is created to obtain the initial admissible control policy by using an offline iterative method under the time-triggered control framework.Subsequently,a general triggering condition is designed based on the uniform ultimate boundedness of the controlled system.In order to determine a constant interval which can ensure the system stability,another triggering condition is introduced and the asymptotic stability of the closed-loop system satisfying this condition is analyzed from the perspective of the input-to-state stability.The designed online hybrid ETC method not only further improves control efficiency,but also avoids the continuous judgment of the corresponding triggering condition.In addition,the event-based control law can approach the optimal control input within a finite approximation error.Finally,two experimental examples with physical background are conducted to indicate the present results.