Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
Deep reinforcement learning(RL)has become one of the most popular topics in artificial intelligence research.It has been widely used in various fields,such as end-to-end control,robotic control,recommendation systems,...Deep reinforcement learning(RL)has become one of the most popular topics in artificial intelligence research.It has been widely used in various fields,such as end-to-end control,robotic control,recommendation systems,and natural language dialogue systems.In this survey,we systematically categorize the deep RL algorithms and applications,and provide a detailed review over existing deep RL algorithms by dividing them into modelbased methods,model-free methods,and advanced RL methods.We thoroughly analyze the advances including exploration,inverse RL,and transfer RL.Finally,we outline the current representative applications,and analyze four open problems for future research.展开更多
文摘Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
基金Project supported by the National Natural Science Foundation of China(Nos.61772541,61872376,and 61932001)。
文摘Deep reinforcement learning(RL)has become one of the most popular topics in artificial intelligence research.It has been widely used in various fields,such as end-to-end control,robotic control,recommendation systems,and natural language dialogue systems.In this survey,we systematically categorize the deep RL algorithms and applications,and provide a detailed review over existing deep RL algorithms by dividing them into modelbased methods,model-free methods,and advanced RL methods.We thoroughly analyze the advances including exploration,inverse RL,and transfer RL.Finally,we outline the current representative applications,and analyze four open problems for future research.