Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
Integral reinforcement learning(IRL)is an effective tool for solving optimal control problems of nonlinear systems,and it has been widely utilized in optimal controller design for solving discrete-time nonlinearity.Ho...Integral reinforcement learning(IRL)is an effective tool for solving optimal control problems of nonlinear systems,and it has been widely utilized in optimal controller design for solving discrete-time nonlinearity.However,solving the Hamilton-Jacobi-Bellman(HJB)equations for nonlinear systems requires precise and complicated dynamics.Moreover,the research and application of IRL in continuous-time(CT)systems must be further improved.To develop the IRL of a CT nonlinear system,a data-based adaptive neural dynamic programming(ANDP)method is proposed to investigate the optimal control problem of uncertain CT multi-input systems such that the knowledge of the dynamics in the HJB equation is unnecessary.First,the multi-input model is approximated using a neural network(NN),which can be utilized to design an integral reinforcement signal.Subsequently,two criterion networks and one action network are constructed based on the integral reinforcement signal.A nonzero-sum Nash equilibrium can be reached by learning the optimal strategies of the multi-input model.In this scheme,the NN weights are constantly updated using an adaptive algorithm.The weight convergence and the system stability are analyzed in detail.The optimal control problem of a multi-input nonlinear CT system is effectively solved using the ANDP scheme,and the results are verified by a simulation study.展开更多
Reinforcement learning(RL)has been widely studied as an efficient class of machine learning methods for adaptive optimal control under uncertainties.In recent years,the applications of RL in optimised decision-making ...Reinforcement learning(RL)has been widely studied as an efficient class of machine learning methods for adaptive optimal control under uncertainties.In recent years,the applications of RL in optimised decision-making and motion control of intelligent vehicles have received increasing attention.Due to the complex and dynamic operating environments of intelligent vehicles,it is necessary to improve the learning efficiency and generalisation ability of RL-based decision and control algorithms under different conditions.This survey systematically examines the theoretical foundations,algorithmic advancements and practical challenges of applying RL to intelligent vehicle systems operating in complex and dynamic environments.The major algorithm frameworks of RL are first introduced,and the recent advances in RL-based decision-making and control of intelligent vehicles are overviewed.In addition to self-learning decision and control approaches using state measurements,the developments of DRL methods for end-to-end driving control of intelligent vehicles are summarised.The open problems and directions for further research works are also discussed.展开更多
Heat integration is important for energy-saving in the process industry.It is linked to the persistently challenging task of optimal design of heat exchanger networks(HEN).Due to the inherent highly nonconvex nonlinea...Heat integration is important for energy-saving in the process industry.It is linked to the persistently challenging task of optimal design of heat exchanger networks(HEN).Due to the inherent highly nonconvex nonlinear and combinatorial nature of the HEN problem,it is not easy to find solutions of high quality for large-scale problems.The reinforcement learning(RL)method,which learns strategies through ongoing exploration and exploitation,reveals advantages in such area.However,due to the complexity of the HEN design problem,the RL method for HEN should be dedicated and designed.A hybrid strategy combining RL with mathematical programming is proposed to take better advantage of both methods.An insightful state representation of the HEN structure as well as a customized reward function is introduced.A Q-learning algorithm is applied to update the HEN structure using theε-greedy strategy.Better results are obtained from three literature cases of different scales.展开更多
Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor...In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.展开更多
We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration(PI),i.e.,start from some base policy and generate an improved policy.Rollout is the simplest...We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration(PI),i.e.,start from some base policy and generate an improved policy.Rollout is the simplest method of this type,where just one improved policy is generated.We can view PI as repeated application of rollout,where the rollout policy at each iteration serves as the base policy for the next iteration.In contrast with PI,rollout has a robustness property:it can be applied on-line and is suitable for on-line replanning.Moreover,rollout can use as base policy one of the policies produced by PI,thereby improving on that policy.This is the type of scheme underlying the prominently successful Alpha Zero chess program.In this paper we focus on rollout and PI-like methods for problems where the control consists of multiple components each selected(conceptually)by a separate agent.This is the class of multiagent problems where the agents have a shared objective function,and a shared and perfect state information.Based on a problem reformulation that trades off control space complexity with state space complexity,we develop an approach,whereby at every stage,the agents sequentially(one-at-a-time)execute a local rollout algorithm that uses a base policy,together with some coordinating information from the other agents.The amount of total computation required at every stage grows linearly with the number of agents.By contrast,in the standard rollout algorithm,the amount of total computation grows exponentially with the number of agents.Despite the dramatic reduction in required computation,we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout:it guarantees an improved performance relative to the base policy.We also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information,which is sufficient to maintain the cost improvement property,without any on-line coordination of control selection between the agents.For discounted and other infinite horizon problems,we also consider exact and approximate PI algorithms involving a new type of one-agent-at-a-time policy improvement operation.For one of our PI algorithms,we prove convergence to an agentby-agent optimal policy,thus establishing a connection with the theory of teams.For another PI algorithm,which is executed over a more complex state space,we prove convergence to an optimal policy.Approximate forms of these algorithms are also given,based on the use of policy and value neural networks.These PI algorithms,in both their exact and their approximate form are strictly off-line methods,but they can be used to provide a base policy for use in an on-line multiagent rollout scheme.展开更多
Tremendous amount of data are being generated and saved in many complex engineering and social systems every day.It is significant and feasible to utilize the big data to make better decisions by machine learning tech...Tremendous amount of data are being generated and saved in many complex engineering and social systems every day.It is significant and feasible to utilize the big data to make better decisions by machine learning techniques. In this paper, we focus on batch reinforcement learning(RL) algorithms for discounted Markov decision processes(MDPs) with large discrete or continuous state spaces, aiming to learn the best possible policy given a fixed amount of training data. The batch RL algorithms with handcrafted feature representations work well for low-dimensional MDPs. However, for many real-world RL tasks which often involve high-dimensional state spaces, it is difficult and even infeasible to use feature engineering methods to design features for value function approximation. To cope with high-dimensional RL problems, the desire to obtain data-driven features has led to a lot of works in incorporating feature selection and feature learning into traditional batch RL algorithms. In this paper, we provide a comprehensive survey on automatic feature selection and unsupervised feature learning for high-dimensional batch RL. Moreover, we present recent theoretical developments on applying statistical learning to establish finite-sample error bounds for batch RL algorithms based on weighted Lpnorms. Finally, we derive some future directions in the research of RL algorithms, theories and applications.展开更多
Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning...Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning(MARL)for real-world DCB problems is proposed.The proposed method can deploy trained agents directly to unseen scenarios in a specific Air Traffic Flow Management(ATFM)region to quickly obtain a satisfactory solution.In this method,agents of all flights in a scenario form a multi-agent decision-making system based on partial observation.The trained agent with the customised neural network can be deployed directly on the corresponding flight,allowing it to solve the DCB problem jointly.A cooperation coefficient is introduced in the reward function,which is used to adjust the agent’s cooperation preference in a multi-agent system,thereby controlling the distribution of flight delay time allocation.A multi-iteration mechanism is designed for the DCB decision-making framework to deal with problems arising from non-stationarity in MARL and to ensure that all hotspots are eliminated.Experiments based on large-scale high-complexity real-world scenarios are conducted to verify the effectiveness and efficiency of the method.From a statis-tical point of view,it is proven that the proposed method is generalised within the scope of the flights and sectors of interest,and its optimisation performance outperforms the standard computer-assisted slot allocation and state-of-the-art RL-based DCB methods.The sensitivity analysis preliminarily reveals the effect of the cooperation coefficient on delay time allocation.展开更多
We address a state-of-the-art reinforcement learning(RL)control approach to automatically configure robotic pros-thesis impedance parameters to enable end-to-end,continuous locomotion intended for transfemoral amputee...We address a state-of-the-art reinforcement learning(RL)control approach to automatically configure robotic pros-thesis impedance parameters to enable end-to-end,continuous locomotion intended for transfemoral amputee subjects.Specifically,our actor-critic based RL provides tracking control of a robotic knee prosthesis to mimic the intact knee profile.This is a significant advance from our previous RL based automatic tuning of prosthesis control parameters which have centered on regulation control with a designer prescribed robotic knee profile as the target.In addition to presenting the tracking control algorithm based on direct heuristic dynamic programming(dHDP),we provide a control performance guarantee including the case of constrained inputs.We show that our proposed tracking control possesses several important properties,such as weight convergence of the learning networks,Bellman(sub)optimality of the cost-to-go value function and control input,and practical stability of the human-robot system.We further provide a systematic simulation of the proposed tracking control using a realistic human-robot system simulator,the OpenSim,to emulate how the dHDP enables level ground walking,walking on different terrains and at different paces.These results show that our proposed dHDP based tracking control is not only theoretically suitable,but also practically useful.展开更多
The optimal dispatch of energy storage systems(ESSs)in distribution networks poses significant challenges,primarily due to uncertainties of dynamic pricing,fluctuating demand,and the variability inherent in renewable ...The optimal dispatch of energy storage systems(ESSs)in distribution networks poses significant challenges,primarily due to uncertainties of dynamic pricing,fluctuating demand,and the variability inherent in renewable energy sources.By exploiting the generalization capabilities of deep neural networks(DNNs),the deep reinforcement learning(DRL)algorithms can learn good-quality control models that adapt to the stochastic nature of distribution networks.Nevertheless,the practical deployment of DRL algorithms is often hampered by their limited capacity for satisfying operational constraints in real time,which is a crucial requirement for ensuring the reliability and feasibility of control actions during online operations.This paper introduces an innovative framework,named mixed-integer programming based deep reinforcement learning(MIP-DRL),to overcome these limitations.The proposed MIP-DRL framework can rigorously enforce operational constraints for the optimal dispatch of ESSs during the online execution.This framework involves training a Q-function with DNNs,which is subsequently represented in a mixed-integer programming(MIP)formulation.This unique combination allows for the seamless integration of operational constraints into the decision-making process.The effectiveness of the proposed MIP-DRL framework is validated through numerical simulations,demonstrating its superior capability to enforce all operational constraints and achieve high-quality dispatch decisions and showing its advantage over existing DRL algorithms.展开更多
Finding out the most effective parameters relating to the resistance of reinforced concrete connections(RCCs)is an important topic in structural engineering.In this study,first,a finite element(FE)model is developed f...Finding out the most effective parameters relating to the resistance of reinforced concrete connections(RCCs)is an important topic in structural engineering.In this study,first,a finite element(FE)model is developed for simulating the performance of RCCs under post-earthquake fire(PEF).Then surrogate models,including multiple linear regression(MLR),multiple natural logarithm(Ln)equation regression(MLn ER),gene expression programming(GEP),and an ensemble model,are used to predict the remaining load-carrying capacity of an RCC under PEF.The statistical parameters,error terms,and a novel statistical table are used to evaluate and compare the accuracy of each surrogate model.According to the results,the ratio of the longitudinal reinforcement bars of the column(RLC)has a significant effect on the resistance of an RCC under PEF.Increasing the value of this parameter from 1%to 8%can increase the residual load-carrying capacity of an RCC under PEF by 492.2%when the RCC is exposed to fire at a temperature of 1000°C.Moreover,based on the results,the ensemble model can predict the residual load-carrying capacity with suitable accuracy.A safety factor of 1.55 should be applied to the results obtained from the ensemble model.展开更多
Surface Penetrating Radar (SPR) is a recently developed technology for non-destructive testing. It can be used to image and interpret the inner structure of the reinforced concrete. This paper gives the details about ...Surface Penetrating Radar (SPR) is a recently developed technology for non-destructive testing. It can be used to image and interpret the inner structure of the reinforced concrete. This paper gives the details about a compact and handheld SPR developed recently for reinforced concrete structure detection. The center operation frequency of the radar is 1.6 GHz. Not only it has fast acquisition ability, but also it can display the testing result on the LCD screen in real-time. The testing results show that the radar has a penetrating range of more than 30 cm, and a lateral resolution better than 5 cm. The performance validates that the radar can meet the application requirements for reinforced concrete structure detection.展开更多
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
文摘Integral reinforcement learning(IRL)is an effective tool for solving optimal control problems of nonlinear systems,and it has been widely utilized in optimal controller design for solving discrete-time nonlinearity.However,solving the Hamilton-Jacobi-Bellman(HJB)equations for nonlinear systems requires precise and complicated dynamics.Moreover,the research and application of IRL in continuous-time(CT)systems must be further improved.To develop the IRL of a CT nonlinear system,a data-based adaptive neural dynamic programming(ANDP)method is proposed to investigate the optimal control problem of uncertain CT multi-input systems such that the knowledge of the dynamics in the HJB equation is unnecessary.First,the multi-input model is approximated using a neural network(NN),which can be utilized to design an integral reinforcement signal.Subsequently,two criterion networks and one action network are constructed based on the integral reinforcement signal.A nonzero-sum Nash equilibrium can be reached by learning the optimal strategies of the multi-input model.In this scheme,the NN weights are constantly updated using an adaptive algorithm.The weight convergence and the system stability are analyzed in detail.The optimal control problem of a multi-input nonlinear CT system is effectively solved using the ANDP scheme,and the results are verified by a simulation study.
基金supported by the National Natural Science Foundation of China under Grant T2521006,Grant 62403483,Grant 62533021 and Grant U24A20279.
文摘Reinforcement learning(RL)has been widely studied as an efficient class of machine learning methods for adaptive optimal control under uncertainties.In recent years,the applications of RL in optimised decision-making and motion control of intelligent vehicles have received increasing attention.Due to the complex and dynamic operating environments of intelligent vehicles,it is necessary to improve the learning efficiency and generalisation ability of RL-based decision and control algorithms under different conditions.This survey systematically examines the theoretical foundations,algorithmic advancements and practical challenges of applying RL to intelligent vehicle systems operating in complex and dynamic environments.The major algorithm frameworks of RL are first introduced,and the recent advances in RL-based decision-making and control of intelligent vehicles are overviewed.In addition to self-learning decision and control approaches using state measurements,the developments of DRL methods for end-to-end driving control of intelligent vehicles are summarised.The open problems and directions for further research works are also discussed.
基金The financial support provided by the Project of National Natural Science Foundation of China(U22A20415,21978256,22308314)“Pioneer”and“Leading Goose”Research&Development Program of Zhejiang(2022C01SA442617)。
文摘Heat integration is important for energy-saving in the process industry.It is linked to the persistently challenging task of optimal design of heat exchanger networks(HEN).Due to the inherent highly nonconvex nonlinear and combinatorial nature of the HEN problem,it is not easy to find solutions of high quality for large-scale problems.The reinforcement learning(RL)method,which learns strategies through ongoing exploration and exploitation,reveals advantages in such area.However,due to the complexity of the HEN design problem,the RL method for HEN should be dedicated and designed.A hybrid strategy combining RL with mathematical programming is proposed to take better advantage of both methods.An insightful state representation of the HEN structure as well as a customized reward function is introduced.A Q-learning algorithm is applied to update the HEN structure using theε-greedy strategy.Better results are obtained from three literature cases of different scales.
文摘Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
文摘In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
文摘We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration(PI),i.e.,start from some base policy and generate an improved policy.Rollout is the simplest method of this type,where just one improved policy is generated.We can view PI as repeated application of rollout,where the rollout policy at each iteration serves as the base policy for the next iteration.In contrast with PI,rollout has a robustness property:it can be applied on-line and is suitable for on-line replanning.Moreover,rollout can use as base policy one of the policies produced by PI,thereby improving on that policy.This is the type of scheme underlying the prominently successful Alpha Zero chess program.In this paper we focus on rollout and PI-like methods for problems where the control consists of multiple components each selected(conceptually)by a separate agent.This is the class of multiagent problems where the agents have a shared objective function,and a shared and perfect state information.Based on a problem reformulation that trades off control space complexity with state space complexity,we develop an approach,whereby at every stage,the agents sequentially(one-at-a-time)execute a local rollout algorithm that uses a base policy,together with some coordinating information from the other agents.The amount of total computation required at every stage grows linearly with the number of agents.By contrast,in the standard rollout algorithm,the amount of total computation grows exponentially with the number of agents.Despite the dramatic reduction in required computation,we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout:it guarantees an improved performance relative to the base policy.We also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information,which is sufficient to maintain the cost improvement property,without any on-line coordination of control selection between the agents.For discounted and other infinite horizon problems,we also consider exact and approximate PI algorithms involving a new type of one-agent-at-a-time policy improvement operation.For one of our PI algorithms,we prove convergence to an agentby-agent optimal policy,thus establishing a connection with the theory of teams.For another PI algorithm,which is executed over a more complex state space,we prove convergence to an optimal policy.Approximate forms of these algorithms are also given,based on the use of policy and value neural networks.These PI algorithms,in both their exact and their approximate form are strictly off-line methods,but they can be used to provide a base policy for use in an on-line multiagent rollout scheme.
基金supported by National Natural Science Foundation of China(Nos.61034002,61233001 and 61273140)
文摘Tremendous amount of data are being generated and saved in many complex engineering and social systems every day.It is significant and feasible to utilize the big data to make better decisions by machine learning techniques. In this paper, we focus on batch reinforcement learning(RL) algorithms for discounted Markov decision processes(MDPs) with large discrete or continuous state spaces, aiming to learn the best possible policy given a fixed amount of training data. The batch RL algorithms with handcrafted feature representations work well for low-dimensional MDPs. However, for many real-world RL tasks which often involve high-dimensional state spaces, it is difficult and even infeasible to use feature engineering methods to design features for value function approximation. To cope with high-dimensional RL problems, the desire to obtain data-driven features has led to a lot of works in incorporating feature selection and feature learning into traditional batch RL algorithms. In this paper, we provide a comprehensive survey on automatic feature selection and unsupervised feature learning for high-dimensional batch RL. Moreover, we present recent theoretical developments on applying statistical learning to establish finite-sample error bounds for batch RL algorithms based on weighted Lpnorms. Finally, we derive some future directions in the research of RL algorithms, theories and applications.
基金co-funded by the National Natural Science Foundation of China(No.61903187)the National Key R&D Program of China(No.2021YFB1600500)+2 种基金the China Scholarship Council(No.202006830095)the Natural Science Foundation of Jiangsu Province(No.BK20190414)the Jiangsu Province Postgraduate Innovation Fund(No.KYCX20_0213).
文摘Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning(MARL)for real-world DCB problems is proposed.The proposed method can deploy trained agents directly to unseen scenarios in a specific Air Traffic Flow Management(ATFM)region to quickly obtain a satisfactory solution.In this method,agents of all flights in a scenario form a multi-agent decision-making system based on partial observation.The trained agent with the customised neural network can be deployed directly on the corresponding flight,allowing it to solve the DCB problem jointly.A cooperation coefficient is introduced in the reward function,which is used to adjust the agent’s cooperation preference in a multi-agent system,thereby controlling the distribution of flight delay time allocation.A multi-iteration mechanism is designed for the DCB decision-making framework to deal with problems arising from non-stationarity in MARL and to ensure that all hotspots are eliminated.Experiments based on large-scale high-complexity real-world scenarios are conducted to verify the effectiveness and efficiency of the method.From a statis-tical point of view,it is proven that the proposed method is generalised within the scope of the flights and sectors of interest,and its optimisation performance outperforms the standard computer-assisted slot allocation and state-of-the-art RL-based DCB methods.The sensitivity analysis preliminarily reveals the effect of the cooperation coefficient on delay time allocation.
基金This work was partly supported by the National Science Foundation(1563921,1808752,1563454,1808898).
文摘We address a state-of-the-art reinforcement learning(RL)control approach to automatically configure robotic pros-thesis impedance parameters to enable end-to-end,continuous locomotion intended for transfemoral amputee subjects.Specifically,our actor-critic based RL provides tracking control of a robotic knee prosthesis to mimic the intact knee profile.This is a significant advance from our previous RL based automatic tuning of prosthesis control parameters which have centered on regulation control with a designer prescribed robotic knee profile as the target.In addition to presenting the tracking control algorithm based on direct heuristic dynamic programming(dHDP),we provide a control performance guarantee including the case of constrained inputs.We show that our proposed tracking control possesses several important properties,such as weight convergence of the learning networks,Bellman(sub)optimality of the cost-to-go value function and control input,and practical stability of the human-robot system.We further provide a systematic simulation of the proposed tracking control using a realistic human-robot system simulator,the OpenSim,to emulate how the dHDP enables level ground walking,walking on different terrains and at different paces.These results show that our proposed dHDP based tracking control is not only theoretically suitable,but also practically useful.
基金supported by the DATALESs project(No.482.20.602)jointly financed by the Netherlands Organization for Scientific Research(NWO)and the National Natural Science Foundation of China.
文摘The optimal dispatch of energy storage systems(ESSs)in distribution networks poses significant challenges,primarily due to uncertainties of dynamic pricing,fluctuating demand,and the variability inherent in renewable energy sources.By exploiting the generalization capabilities of deep neural networks(DNNs),the deep reinforcement learning(DRL)algorithms can learn good-quality control models that adapt to the stochastic nature of distribution networks.Nevertheless,the practical deployment of DRL algorithms is often hampered by their limited capacity for satisfying operational constraints in real time,which is a crucial requirement for ensuring the reliability and feasibility of control actions during online operations.This paper introduces an innovative framework,named mixed-integer programming based deep reinforcement learning(MIP-DRL),to overcome these limitations.The proposed MIP-DRL framework can rigorously enforce operational constraints for the optimal dispatch of ESSs during the online execution.This framework involves training a Q-function with DNNs,which is subsequently represented in a mixed-integer programming(MIP)formulation.This unique combination allows for the seamless integration of operational constraints into the decision-making process.The effectiveness of the proposed MIP-DRL framework is validated through numerical simulations,demonstrating its superior capability to enforce all operational constraints and achieve high-quality dispatch decisions and showing its advantage over existing DRL algorithms.
文摘Finding out the most effective parameters relating to the resistance of reinforced concrete connections(RCCs)is an important topic in structural engineering.In this study,first,a finite element(FE)model is developed for simulating the performance of RCCs under post-earthquake fire(PEF).Then surrogate models,including multiple linear regression(MLR),multiple natural logarithm(Ln)equation regression(MLn ER),gene expression programming(GEP),and an ensemble model,are used to predict the remaining load-carrying capacity of an RCC under PEF.The statistical parameters,error terms,and a novel statistical table are used to evaluate and compare the accuracy of each surrogate model.According to the results,the ratio of the longitudinal reinforcement bars of the column(RLC)has a significant effect on the resistance of an RCC under PEF.Increasing the value of this parameter from 1%to 8%can increase the residual load-carrying capacity of an RCC under PEF by 492.2%when the RCC is exposed to fire at a temperature of 1000°C.Moreover,based on the results,the ensemble model can predict the residual load-carrying capacity with suitable accuracy.A safety factor of 1.55 should be applied to the results obtained from the ensemble model.
文摘Surface Penetrating Radar (SPR) is a recently developed technology for non-destructive testing. It can be used to image and interpret the inner structure of the reinforced concrete. This paper gives the details about a compact and handheld SPR developed recently for reinforced concrete structure detection. The center operation frequency of the radar is 1.6 GHz. Not only it has fast acquisition ability, but also it can display the testing result on the LCD screen in real-time. The testing results show that the radar has a penetrating range of more than 30 cm, and a lateral resolution better than 5 cm. The performance validates that the radar can meet the application requirements for reinforced concrete structure detection.