In this article,a novel model-free coordinated optimal regulation design methodology is proposed for the rigidly connected dual permanent magnet synchronous motor(PMSM)system via adaptive dynamic programming(ADP).Firs...In this article,a novel model-free coordinated optimal regulation design methodology is proposed for the rigidly connected dual permanent magnet synchronous motor(PMSM)system via adaptive dynamic programming(ADP).First,we adopt the classical master-slave structure to maintain torque synchronization by virtue of field-oriented control.Then,a reducedorder model of the dual-PMSM system is established through the application of singular perturbation theory(SPT),which is of significance to decrease the learning time and computational complexity in the outer speed loop design.Afterwards,we design a coordinated adaptive optimal regulator in framework of ADP to drive the speed of girth gear asymptotic tracking the reference signal and accommodate the load torque disturbance,which is independent of the knowledge of model parameters of the system.According to SPT,we analyze the suboptimality,closed-loop stability,and robustness properties of the obtained controller under mild conditions.Finally,comprehensive experimental studies are provided to verify that the proposed control strategy can achieve the speed regulation and the torque synchronization,as well as ameliorate the transient response.展开更多
Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we pr...Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we propose the Dyna actiondependent heuristic dynamic programming(Dyna-ADHDP)method, which incorporates the ideas of learning and planning from the Dyna framework in action-dependent heuristic dynamic programming. This method defines a continuous action space for precise control of an energy storage system and allows online optimization of algorithm performance during the real-time operation of the residential energy model. Meanwhile, the target network is introduced during the training process to make the training smoother and more efficient. We conducted experimental comparisons with the benchmark method using simulated and real data to verify its applicability and performance. The results confirm the method's excellent performance and generalization capabilities, as well as its excellence in increasing renewable energy utilization and extending equipment life.展开更多
In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others...In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others'system parameters or control laws.Each player adopts an on-policy value iteration algorithm as the basic learning framework.To deal with the incomplete information structure,players collect a period of system trajectory data to compensate for the lack of information.The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy.Theoretical analysis shows that by adopting proximal policy searching rules,the approximated policies can converge to a neighborhood of equilibrium policies.The efficacy of our method is illustrated by three examples,which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.展开更多
This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is int...This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is introduced into the feedback system.However,due to the introduction of control input into the feedback system,the optimal state feedback control methods can not be applied directly.To address this problem,an augmented system and an augmented performance index function are proposed firstly.Thus,the general nonlinear system is transformed into an affine nonlinear system.The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically.It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function.Moreover,an adaptive dynamic programming(ADP)technique is utilized to implement the optimal parallel tracking control using a critic neural network(NN)to approximate the value function online.The stability analysis of the closed-loop system is performed using the Lyapunov theory,and the tracking error and NN weights errors are uniformly ultimately bounded(UUB).Also,the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals.Finally,the effectiveness of the developed optimal parallel control method is verified in two cases.展开更多
Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
The residential energy scheduling of solar energy is an important research area of smart grid. On the demand side, factors such as household loads, storage batteries, the outside public utility grid and renewable ener...The residential energy scheduling of solar energy is an important research area of smart grid. On the demand side, factors such as household loads, storage batteries, the outside public utility grid and renewable energy resources, are combined together as a nonlinear, time-varying, indefinite and complex system, which is difficult to manage or optimize. Many nations have already applied the residential real-time pricing to balance the burden on their grid. In order to enhance electricity efficiency of the residential micro grid, this paper presents an action dependent heuristic dynamic programming(ADHDP) method to solve the residential energy scheduling problem. The highlights of this paper are listed below. First,the weather-type classification is adopted to establish three types of programming models based on the features of the solar energy. In addition, the priorities of different energy resources are set to reduce the loss of electrical energy transmissions.Second, three ADHDP-based neural networks, which can update themselves during applications, are designed to manage the flows of electricity. Third, simulation results show that the proposed scheduling method has effectively reduced the total electricity cost and improved load balancing process. The comparison with the particle swarm optimization algorithm further proves that the present method has a promising effect on energy management to save cost.展开更多
This paper presents a new design approach to achieve decentralized optimal control of high-dimension complex singular systems with dynamic uncertainties. Based on robust adaptive dynamic programming(robust ADP) method...This paper presents a new design approach to achieve decentralized optimal control of high-dimension complex singular systems with dynamic uncertainties. Based on robust adaptive dynamic programming(robust ADP) method, controllers for solving the singular systems optimal control problem are designed. The proposed algorithm can work well when the system model is not exactly known but the input and output data can be measured. The policy iteration of each controller only uses their own states and input information for learning,and do not need to know the whole system dynamics. Simulation results on the New England 10-machine 39-bus test system show the effectiveness of the designed controller.展开更多
In this paper,the multi-missile cooperative guidance system is formulated as a general nonlinear multi-agent system.To save the limited communication resources,an adaptive eventtriggered optimal guidance law is propos...In this paper,the multi-missile cooperative guidance system is formulated as a general nonlinear multi-agent system.To save the limited communication resources,an adaptive eventtriggered optimal guidance law is proposed by designing a synchronization-error-driven triggering condition,which brings together the consensus control with Adaptive Dynamic Programming(ADP)technique.Then,the developed event-triggered distributed control law can be employed by finding an approximate solution of event-triggered coupled Hamilton-Jacobi-Bellman(HJB)equation.To address this issue,the critic network architecture is constructed,in which an adaptive weight updating law is designed for estimating the cooperative optimal cost function online.Therefore,the event-triggered closed-loop system is decomposed into two subsystems:the system with flow dynamics and the system with jump dynamics.By using Lyapunov method,the stability of this closed-loop system is guaranteed and all signals are ensured to be Uniformly Ultimately Bounded(UUB).Furthermore,the Zeno behavior is avoided.Simulation results are finally provided to demonstrate the effectiveness of the proposed method.展开更多
A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking prob...A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.展开更多
In this paper,a novel adaptive Fault-Tolerant Control(FTC)strategy is proposed for non-minimum phase Hypersonic Vehicles(HSVs)that are affected by actuator faults and parameter uncertainties.The strategy is based on t...In this paper,a novel adaptive Fault-Tolerant Control(FTC)strategy is proposed for non-minimum phase Hypersonic Vehicles(HSVs)that are affected by actuator faults and parameter uncertainties.The strategy is based on the output redefinition method and Adaptive Dynamic Programming(ADP).The intelligent FTC scheme consists of two main parts:a basic fault-tolerant and stable controller and an ADP-based supplementary controller.In the basic FTC part,an output redefinition approach is designed to make zero-dynamics stable with respect to the new output.Then,Ideal Internal Dynamic(IID)is obtained using an optimal bounded inversion approach,and a tracking controller is designed for the new output to realize output tracking of the nonminimum phase HSV system.For the ADP-based compensation control part,an ActionDependent Heuristic Dynamic Programming(ADHDP)adopting an actor-critic learning structure is utilized to further optimize the tracking performance of the HSV control system.Finally,simulation results are provided to verify the effectiveness and efficiency of the proposed FTC algorithm.展开更多
In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed metho...In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data. More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem. We then design an observer to reproduce the full state of the system by measuring the inputs and outputs. Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model. It is important that with this concept one does not need to solve the regulator equation. Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection.展开更多
This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discretetime systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-vary...This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discretetime systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-varying discrete-time systems is presented. Its connections with existing in finite-horizon PI methods are discussed. Then, both data-drive n off-policy PI and Value Iteration (VI) algorithms are derived to find approximate optimal controllers when the system dynamics is completely unknown. Under mild conditions, the proposed data-driven off-policy algorithms converge to the optimal solution. Finally, the effectiveness and feasibility of the developed methods are validated by a practical example of spacecraft attitude control.展开更多
In this paper,a stochastic linear quadratic optimal tracking scheme is proposed for unknown linear discrete-time(DT)systems based on adaptive dynamic programming(ADP)algorithm.First,an augmented system composed of the...In this paper,a stochastic linear quadratic optimal tracking scheme is proposed for unknown linear discrete-time(DT)systems based on adaptive dynamic programming(ADP)algorithm.First,an augmented system composed of the original system and the command generator is constructed and then an augmented stochastic algebraic equation is derived based on the augmented system.Next,to obtain the optimal control strategy,the stochastic case is converted into the deterministic one by system transformation,and then an ADP algorithm is proposed with convergence analysis.For the purpose of realizing the ADP algorithm,three back propagation neural networks including model network,critic network and action network are devised to guarantee unknown system model,optimal value function and optimal control strategy,respectively.Finally,the obtained optimal control strategy is applied to the original stochastic system,and two simulations are provided to demonstrate the effectiveness of the proposed algorithm.展开更多
An adaptive weighted stereo matching algorithm with multilevel and bidirectional dynamic programming based on ground control points (GCPs) is presented. To decrease time complexity without losing matching precision, u...An adaptive weighted stereo matching algorithm with multilevel and bidirectional dynamic programming based on ground control points (GCPs) is presented. To decrease time complexity without losing matching precision, using a multilevel search scheme, the coarse matching is processed in typical disparity space image, while the fine matching is processed in disparity-offset space image. In the upper level, GCPs are obtained by enhanced volumetric iterative algorithm enforcing the mutual constraint and the threshold constraint. Under the supervision of the highly reliable GCPs, bidirectional dynamic programming framework is employed to solve the inconsistency in the optimization path. In the lower level, to reduce running time, disparity-offset space is proposed to efficiently achieve the dense disparity image. In addition, an adaptive dual support-weight strategy is presented to aggregate matching cost, which considers photometric and geometric information. Further, post-processing algorithm can ameliorate disparity results in areas with depth discontinuities and related by occlusions using dual threshold algorithm, where missing stereo information is substituted from surrounding regions. To demonstrate the effectiveness of the algorithm, we present the two groups of experimental results for four widely used standard stereo data sets, including discussion on performance and comparison with other methods, which show that the algorithm has not only a fast speed, but also significantly improves the efficiency of holistic optimization.展开更多
An optimal tracking control problem for a class of nonlinear systems with guaranteed performance and asymmetric input constraints is discussed in this paper.The control policy is implemented by adaptive dynamic progra...An optimal tracking control problem for a class of nonlinear systems with guaranteed performance and asymmetric input constraints is discussed in this paper.The control policy is implemented by adaptive dynamic programming(ADP)algorithm under two event-based triggering mechanisms.It is often challenging to design an optimal control law due to the system deviation caused by asymmetric input constraints.First,a prescribed performance control technique is employed to guarantee the tracking errors within predetermined boundaries.Subsequently,considering the asymmetric input constraints,a discounted non-quadratic cost function is introduced.Moreover,in order to reduce controller updates,an event-triggered control law is developed for ADP algorithm.After that,to further simplify the complexity of controller design,this work is extended to a self-triggered case for relaxing the need for continuous signal monitoring by hardware devices.By employing the Lyapunov method,the uniform ultimate boundedness of all signals is proved to be guaranteed.Finally,a simulation example on a mass–spring–damper system subject to asymmetric input constraints is provided to validate the effectiveness of the proposed control scheme.展开更多
Considering the economics and securities for the operation of a power system, this paper presents a new adaptive dynamic programming approach for security-constrained unit commitment (SCUC) problems. In response to t...Considering the economics and securities for the operation of a power system, this paper presents a new adaptive dynamic programming approach for security-constrained unit commitment (SCUC) problems. In response to the “curse of dimension” problem of dynamic programming, the approach solves the Bellman’s equation of SCUC approximately by solving a sequence of simplified single stage optimization problems. An extended sequential truncation technique is proposed to explore the state space of the approach, which is superior to traditional sequential truncation in daily cost for unit commitment. Different test cases from 30 to 300 buses over a 24 h horizon are analyzed. Extensive numerical comparisons show that the proposed approach is capable of obtaining the optimal unit commitment schedules without any network and bus voltage violations, and minimizing the operation cost as well.展开更多
This paper studies motor joint control of a 4-degree-of-freedom(DoF)robotic manipulator using learning-based Adaptive Dynamic Programming(ADP)approach.The manipulator’s dynamics are modelled as an open-loop 4-link se...This paper studies motor joint control of a 4-degree-of-freedom(DoF)robotic manipulator using learning-based Adaptive Dynamic Programming(ADP)approach.The manipulator’s dynamics are modelled as an open-loop 4-link serial kinematic chain with 4 Degrees of Freedom(DoF).Decentralised optimal controllers are designed for each link using ADP approach based on a set of cost matrices and data collected from exploration trajectories.The proposed control strategy employs an off-line,off-policy iterative approach to derive four optimal control policies,one for each joint,under exploration strategies.The objective of the controller is to control the position of each joint.Simulation and experimental results show that four independent optimal controllers are found,each under similar exploration strategies,and the proposed ADP approach successfully yields optimal linear control policies despite the presence of these complexities.The experimental results conducted on the Quanser Qarm robotic platform demonstrate the effectiveness of the proposed ADP controllers in handling significant dynamic nonlinearities,such as actuation limitations,output saturation,and filter delays.展开更多
The paper develops a robust control approach for nonaffine nonlinear continuous systems with input constraints and unknown uncertainties. Firstly, this paper constructs an affine augmented system(AAS) within a pre-com...The paper develops a robust control approach for nonaffine nonlinear continuous systems with input constraints and unknown uncertainties. Firstly, this paper constructs an affine augmented system(AAS) within a pre-compensation technique for converting the original nonaffine dynamics into affine dynamics. Secondly, the paper derives a stability criterion linking the original nonaffine system and the auxiliary system, demonstrating that the obtained optimal policies from the auxiliary system can achieve the robust controller of the nonaffine system. Thirdly, an online adaptive dynamic programming(ADP) algorithm is designed for approximating the optimal solution of the Hamilton–Jacobi–Bellman(HJB) equation.Moreover, the gradient descent approach and projection approach are employed for updating the actor-critic neural network(NN) weights, with the algorithm's convergence being proven. Then, the uniformly ultimately bounded stability of state is guaranteed. Finally, in simulation, some examples are offered for validating the effectiveness of this presented approach.展开更多
基金supported by the National Natural Science Foundation of China(62073327,62403467,62373090,62273350,62521001)the Natural Science Foundation of Jiangsu Province(BK20241635)+2 种基金the Postdoctoral Fellowship Program of China Postdoctoral Science Foundation(CPSF)(GZB20240827)Jiangsu Funding Program for Excellent Postdoctoral Talent(2024ZB604)the China Postdoctoral Science Foundation(2024M763545,2025T054ZGMK).
文摘In this article,a novel model-free coordinated optimal regulation design methodology is proposed for the rigidly connected dual permanent magnet synchronous motor(PMSM)system via adaptive dynamic programming(ADP).First,we adopt the classical master-slave structure to maintain torque synchronization by virtue of field-oriented control.Then,a reducedorder model of the dual-PMSM system is established through the application of singular perturbation theory(SPT),which is of significance to decrease the learning time and computational complexity in the outer speed loop design.Afterwards,we design a coordinated adaptive optimal regulator in framework of ADP to drive the speed of girth gear asymptotic tracking the reference signal and accommodate the load torque disturbance,which is independent of the knowledge of model parameters of the system.According to SPT,we analyze the suboptimality,closed-loop stability,and robustness properties of the obtained controller under mild conditions.Finally,comprehensive experimental studies are provided to verify that the proposed control strategy can achieve the speed regulation and the torque synchronization,as well as ameliorate the transient response.
基金supported in part by the National Key Research and Development Program of China(2024YFB4709100,2021YFE0206100)the National Natural Science Foundation of China(62073321)+1 种基金the National Defense Basic Scientific Research Program(JCKY2019203C029)the Science and Technology Development Fund,Macao SAR,China(0015/2020/AMJ)
文摘Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we propose the Dyna actiondependent heuristic dynamic programming(Dyna-ADHDP)method, which incorporates the ideas of learning and planning from the Dyna framework in action-dependent heuristic dynamic programming. This method defines a continuous action space for precise control of an energy storage system and allows online optimization of algorithm performance during the real-time operation of the residential energy model. Meanwhile, the target network is introduced during the training process to make the training smoother and more efficient. We conducted experimental comparisons with the benchmark method using simulated and real data to verify its applicability and performance. The results confirm the method's excellent performance and generalization capabilities, as well as its excellence in increasing renewable energy utilization and extending equipment life.
基金supported by the Aeronautical Science Foundation of China(20220001057001)an Open Project of the National Key Laboratory of Air-based Information Perception and Fusion(202437)
文摘In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others'system parameters or control laws.Each player adopts an on-policy value iteration algorithm as the basic learning framework.To deal with the incomplete information structure,players collect a period of system trajectory data to compensate for the lack of information.The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy.Theoretical analysis shows that by adopting proximal policy searching rules,the approximated policies can converge to a neighborhood of equilibrium policies.The efficacy of our method is illustrated by three examples,which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.
基金supported in part by the National Key Reseanch and Development Program of China(2018AAA0101502,2018YFB1702300)in part by the National Natural Science Foundation of China(61722312,61533019,U1811463,61533017)in part by the Intel Collaborative Research Institute for Intelligent and Automated Connected Vehicles。
文摘This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is introduced into the feedback system.However,due to the introduction of control input into the feedback system,the optimal state feedback control methods can not be applied directly.To address this problem,an augmented system and an augmented performance index function are proposed firstly.Thus,the general nonlinear system is transformed into an affine nonlinear system.The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically.It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function.Moreover,an adaptive dynamic programming(ADP)technique is utilized to implement the optimal parallel tracking control using a critic neural network(NN)to approximate the value function online.The stability analysis of the closed-loop system is performed using the Lyapunov theory,and the tracking error and NN weights errors are uniformly ultimately bounded(UUB).Also,the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals.Finally,the effectiveness of the developed optimal parallel control method is verified in two cases.
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
基金supported in part by the National Natural Science Foundation of China(61533017,U1501251,61374105,61722312)
文摘The residential energy scheduling of solar energy is an important research area of smart grid. On the demand side, factors such as household loads, storage batteries, the outside public utility grid and renewable energy resources, are combined together as a nonlinear, time-varying, indefinite and complex system, which is difficult to manage or optimize. Many nations have already applied the residential real-time pricing to balance the burden on their grid. In order to enhance electricity efficiency of the residential micro grid, this paper presents an action dependent heuristic dynamic programming(ADHDP) method to solve the residential energy scheduling problem. The highlights of this paper are listed below. First,the weather-type classification is adopted to establish three types of programming models based on the features of the solar energy. In addition, the priorities of different energy resources are set to reduce the loss of electrical energy transmissions.Second, three ADHDP-based neural networks, which can update themselves during applications, are designed to manage the flows of electricity. Third, simulation results show that the proposed scheduling method has effectively reduced the total electricity cost and improved load balancing process. The comparison with the particle swarm optimization algorithm further proves that the present method has a promising effect on energy management to save cost.
基金supported in part by the National Natural Science Foundation of China(61473070,61433004,61627809)SAPI Fundamental Research Funds(2018ZCX22)
文摘This paper presents a new design approach to achieve decentralized optimal control of high-dimension complex singular systems with dynamic uncertainties. Based on robust adaptive dynamic programming(robust ADP) method, controllers for solving the singular systems optimal control problem are designed. The proposed algorithm can work well when the system model is not exactly known but the input and output data can be measured. The policy iteration of each controller only uses their own states and input information for learning,and do not need to know the whole system dynamics. Simulation results on the New England 10-machine 39-bus test system show the effectiveness of the designed controller.
基金supported in part by National Natural Science Foundation of China(61533017,61273140,61304079,61374105,61379099,61233001)Fundamental Research Funds for the Central Universities(FRF-TP-15-056A3)the Open Research Project from SKLMCCS(20150104)
基金co-supported by the National Natural Science Foundation of China(No.62003036)China Postdoctoral Science Foundation(No.2019TQ0037)。
文摘In this paper,the multi-missile cooperative guidance system is formulated as a general nonlinear multi-agent system.To save the limited communication resources,an adaptive eventtriggered optimal guidance law is proposed by designing a synchronization-error-driven triggering condition,which brings together the consensus control with Adaptive Dynamic Programming(ADP)technique.Then,the developed event-triggered distributed control law can be employed by finding an approximate solution of event-triggered coupled Hamilton-Jacobi-Bellman(HJB)equation.To address this issue,the critic network architecture is constructed,in which an adaptive weight updating law is designed for estimating the cooperative optimal cost function online.Therefore,the event-triggered closed-loop system is decomposed into two subsystems:the system with flow dynamics and the system with jump dynamics.By using Lyapunov method,the stability of this closed-loop system is guaranteed and all signals are ensured to be Uniformly Ultimately Bounded(UUB).Furthermore,the Zeno behavior is avoided.Simulation results are finally provided to demonstrate the effectiveness of the proposed method.
基金Supported by National High Technology Research and Development Program of China (863 Program) (2006AA04Z183), National Nat- ural Science Foundation of China (60621001, 60534010, 60572070, 60774048, 60728307), and the Program for Changjiang Scholars and Innovative Research Groups of China (60728307, 4031002)
基金supported by the National Natural Science Foundation of China(Grant Nos.61034002,61233001,61273140,61304086,and 61374105)the Beijing Natural Science Foundation,China(Grant No.4132078)
文摘A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.
基金supported in part by the Science Center Program of National Natural Science Foundation of China(62373189,62188101,62020106003)the Research Fund of State Key Laboratory of Mechanics and Control for Aerospace Structures,China。
文摘In this paper,a novel adaptive Fault-Tolerant Control(FTC)strategy is proposed for non-minimum phase Hypersonic Vehicles(HSVs)that are affected by actuator faults and parameter uncertainties.The strategy is based on the output redefinition method and Adaptive Dynamic Programming(ADP).The intelligent FTC scheme consists of two main parts:a basic fault-tolerant and stable controller and an ADP-based supplementary controller.In the basic FTC part,an output redefinition approach is designed to make zero-dynamics stable with respect to the new output.Then,Ideal Internal Dynamic(IID)is obtained using an optimal bounded inversion approach,and a tracking controller is designed for the new output to realize output tracking of the nonminimum phase HSV system.For the ADP-based compensation control part,an ActionDependent Heuristic Dynamic Programming(ADHDP)adopting an actor-critic learning structure is utilized to further optimize the tracking performance of the HSV control system.Finally,simulation results are provided to verify the effectiveness and efficiency of the proposed FTC algorithm.
基金supported by the National Science Fund for Distinguished Young Scholars (62225303)the Fundamental Research Funds for the Central Universities (buctrc202201)+1 种基金China Scholarship Council,and High Performance Computing PlatformCollege of Information Science and Technology,Beijing University of Chemical Technology。
文摘In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data. More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem. We then design an observer to reproduce the full state of the system by measuring the inputs and outputs. Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model. It is important that with this concept one does not need to solve the regulator equation. Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection.
基金The work of B. Pang and Z.-P. Jiang has been supported in part by the National Science Foundation (No. ECCS-1501044).
文摘This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discretetime systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-varying discrete-time systems is presented. Its connections with existing in finite-horizon PI methods are discussed. Then, both data-drive n off-policy PI and Value Iteration (VI) algorithms are derived to find approximate optimal controllers when the system dynamics is completely unknown. Under mild conditions, the proposed data-driven off-policy algorithms converge to the optimal solution. Finally, the effectiveness and feasibility of the developed methods are validated by a practical example of spacecraft attitude control.
基金This work was supported by the National Natural Science Foundation of China(No.61873248)the Hubei Provincial Natural Science Foundation of China(Nos.2017CFA030,2015CFA010)the 111 project(No.B17040).
文摘In this paper,a stochastic linear quadratic optimal tracking scheme is proposed for unknown linear discrete-time(DT)systems based on adaptive dynamic programming(ADP)algorithm.First,an augmented system composed of the original system and the command generator is constructed and then an augmented stochastic algebraic equation is derived based on the augmented system.Next,to obtain the optimal control strategy,the stochastic case is converted into the deterministic one by system transformation,and then an ADP algorithm is proposed with convergence analysis.For the purpose of realizing the ADP algorithm,three back propagation neural networks including model network,critic network and action network are devised to guarantee unknown system model,optimal value function and optimal control strategy,respectively.Finally,the obtained optimal control strategy is applied to the original stochastic system,and two simulations are provided to demonstrate the effectiveness of the proposed algorithm.
基金supported by the National Natural Science Foundation of China(No.60605023,60775048)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060141006)
文摘An adaptive weighted stereo matching algorithm with multilevel and bidirectional dynamic programming based on ground control points (GCPs) is presented. To decrease time complexity without losing matching precision, using a multilevel search scheme, the coarse matching is processed in typical disparity space image, while the fine matching is processed in disparity-offset space image. In the upper level, GCPs are obtained by enhanced volumetric iterative algorithm enforcing the mutual constraint and the threshold constraint. Under the supervision of the highly reliable GCPs, bidirectional dynamic programming framework is employed to solve the inconsistency in the optimization path. In the lower level, to reduce running time, disparity-offset space is proposed to efficiently achieve the dense disparity image. In addition, an adaptive dual support-weight strategy is presented to aggregate matching cost, which considers photometric and geometric information. Further, post-processing algorithm can ameliorate disparity results in areas with depth discontinuities and related by occlusions using dual threshold algorithm, where missing stereo information is substituted from surrounding regions. To demonstrate the effectiveness of the algorithm, we present the two groups of experimental results for four widely used standard stereo data sets, including discussion on performance and comparison with other methods, which show that the algorithm has not only a fast speed, but also significantly improves the efficiency of holistic optimization.
基金supported in part by the National Natural Science Foundation of China(62033003,62003093,62373113,U23A20341,U21A20522)the Natural Science Foundation of Guangdong Province,China(2023A1515011527,2022A1515011506).
文摘An optimal tracking control problem for a class of nonlinear systems with guaranteed performance and asymmetric input constraints is discussed in this paper.The control policy is implemented by adaptive dynamic programming(ADP)algorithm under two event-based triggering mechanisms.It is often challenging to design an optimal control law due to the system deviation caused by asymmetric input constraints.First,a prescribed performance control technique is employed to guarantee the tracking errors within predetermined boundaries.Subsequently,considering the asymmetric input constraints,a discounted non-quadratic cost function is introduced.Moreover,in order to reduce controller updates,an event-triggered control law is developed for ADP algorithm.After that,to further simplify the complexity of controller design,this work is extended to a self-triggered case for relaxing the need for continuous signal monitoring by hardware devices.By employing the Lyapunov method,the uniform ultimate boundedness of all signals is proved to be guaranteed.Finally,a simulation example on a mass–spring–damper system subject to asymmetric input constraints is provided to validate the effectiveness of the proposed control scheme.
文摘Considering the economics and securities for the operation of a power system, this paper presents a new adaptive dynamic programming approach for security-constrained unit commitment (SCUC) problems. In response to the “curse of dimension” problem of dynamic programming, the approach solves the Bellman’s equation of SCUC approximately by solving a sequence of simplified single stage optimization problems. An extended sequential truncation technique is proposed to explore the state space of the approach, which is superior to traditional sequential truncation in daily cost for unit commitment. Different test cases from 30 to 300 buses over a 24 h horizon are analyzed. Extensive numerical comparisons show that the proposed approach is capable of obtaining the optimal unit commitment schedules without any network and bus voltage violations, and minimizing the operation cost as well.
基金supported by the DEEPCOBOT project under Grant 306640/O70 funded by the Research Council of Norway.
文摘This paper studies motor joint control of a 4-degree-of-freedom(DoF)robotic manipulator using learning-based Adaptive Dynamic Programming(ADP)approach.The manipulator’s dynamics are modelled as an open-loop 4-link serial kinematic chain with 4 Degrees of Freedom(DoF).Decentralised optimal controllers are designed for each link using ADP approach based on a set of cost matrices and data collected from exploration trajectories.The proposed control strategy employs an off-line,off-policy iterative approach to derive four optimal control policies,one for each joint,under exploration strategies.The objective of the controller is to control the position of each joint.Simulation and experimental results show that four independent optimal controllers are found,each under similar exploration strategies,and the proposed ADP approach successfully yields optimal linear control policies despite the presence of these complexities.The experimental results conducted on the Quanser Qarm robotic platform demonstrate the effectiveness of the proposed ADP controllers in handling significant dynamic nonlinearities,such as actuation limitations,output saturation,and filter delays.
基金Project supported by the National Natural Science Foundation of China (Grant No. 62103408)Beijing Nova Program (Grant No. 20240484516)the Fundamental Research Funds for the Central Universities (Grant No. KG16314701)。
文摘The paper develops a robust control approach for nonaffine nonlinear continuous systems with input constraints and unknown uncertainties. Firstly, this paper constructs an affine augmented system(AAS) within a pre-compensation technique for converting the original nonaffine dynamics into affine dynamics. Secondly, the paper derives a stability criterion linking the original nonaffine system and the auxiliary system, demonstrating that the obtained optimal policies from the auxiliary system can achieve the robust controller of the nonaffine system. Thirdly, an online adaptive dynamic programming(ADP) algorithm is designed for approximating the optimal solution of the Hamilton–Jacobi–Bellman(HJB) equation.Moreover, the gradient descent approach and projection approach are employed for updating the actor-critic neural network(NN) weights, with the algorithm's convergence being proven. Then, the uniformly ultimately bounded stability of state is guaranteed. Finally, in simulation, some examples are offered for validating the effectiveness of this presented approach.