数据驱动自适应评判控制研究进展被引量：1

Research Advances on Data-driven Adaptive Critic Control

下载PDF

导出

摘要最优控制与人工智能的融合发展产生了一类以执行−评判设计为主要思想的自适应动态规划(ADP)方法.通过集成动态规划理论、强化学习机制、神经网络技术、函数优化算法,ADP在求解大规模复杂非线性系统的决策和调控问题上取得重要进展.然而,实际系统的未知参数和不确定扰动经常导致难以建立精确的数学模型,对最优控制器的设计提出挑战.近年来,具有强大自学习和自适应能力的数据驱动ADP方法受到广泛关注,它能够在不依赖动态模型的情况下,仅利用系统的输入输出数据为复杂非线性系统设计出稳定、安全、可靠的最优控制器,符合智能自动化的发展潮流.通过对数据驱动ADP方法的算法实现、理论特性、相关应用等方面进行梳理,着重介绍了最新的研究进展,包括在线Q学习、值迭代Q学习、策略迭代Q学习、加速Q学习、迁移Q学习、跟踪Q学习、安全Q学习和博弈Q学习,并涵盖数据学习范式、稳定性、收敛性以及最优性的分析.此外,为提高学习效率和控制性能,设计了一些改进的评判机制和效用函数.最后,以污水处理过程为背景,总结数据驱动ADP方法在实际工业系统中的应用效果和存在问题,并展望一些未来的研究方向. The fusion and development of optimal control and artificial intelligence yields adaptive dynamic programming(ADP)methods,which are primarily constructed based on the actor-critic design.By integrating dynamic programming theory,reinforcement learning mechanisms,neural network technologies,and function optimization algorithms,ADP has achieved significant progress in solving decision-making and control problems for large-scale complex nonlinear systems.However,the unknown parameters and uncertain disturbances of actual systems often make it difficult to establish accurate mathematical models,posing challenges to the design of optimal controllers.In recent years,data-driven ADP methods with strong self-learning and adaptive capabilities have received widespread attention.ADP methods can design stable,safe,and reliable optimal controllers for complex nonlinear systems using only the input-output data of the system without relying on dynamical models,aligning with the trend of intelligent automation.This paper comprehensively reviews the algorithm implementation,theoretical characteristics,and related applications of data-driven ADP methods,emphasizing the latest research progress,including online Q-learning,value-iteration-based Q-learning,policy-iteration-based Q-learning,accelerated Q-learning,transfer Q-learning,tracking Q-learning,safe Q-learning and game Q-learning.This paper also covers the analysis of data learning paradigms,stability,convergence,and optimality.Furthermore,in order to enhance learning efficiency and control performance,this paper designs some improved critic schemes and utility functions.Finally,with the background of wastewater treatment processes,this paper summarizes the application effects and existing issues of datadriven ADP approaches in practical industrial systems,and outlines several future research directions.

作者王鼎赵明明刘德荣乔俊飞宋世杰 WANG Ding;ZHAO Ming-Ming;LIU De-Rong;QIAO Jun-Fei;SONG Shi-Jie(School of Information Science and Technology,Beijing University of Technology,Beijing 100124;Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing 100124;Beijing Laboratory of Smart Environmental Protection,Beijing 100124;Beijing Institute of Artificial Intelligence,Beijing 100124;School of Automation and Intelligent Manufacturing,Southern University of Science and Technology,Shenzhen 518055;Institute of Smart City and Intelligent Transportation,Southwest Jiaotong University,Chengdu 611756)