期刊文献+
共找到13篇文章
< 1 >
每页显示 20 50 100
Value Iteration-Based Distributed Adaptive Dynamic Programming for Multi-Player Differential Game With Incomplete Information
1
作者 Yun Zhang Yuqi Wang Yunze Cai 《IEEE/CAA Journal of Automatica Sinica》 2025年第2期436-447,共12页
In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others&#... In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others'system parameters or control laws.Each player adopts an on-policy value iteration algorithm as the basic learning framework.To deal with the incomplete information structure,players collect a period of system trajectory data to compensate for the lack of information.The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy.Theoretical analysis shows that by adopting proximal policy searching rules,the approximated policies can converge to a neighborhood of equilibrium policies.The efficacy of our method is illustrated by three examples,which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework. 展开更多
关键词 Distributed adaptive dynamic programming incomplete information multi-player differential game(MPDG) value iteration
在线阅读 下载PDF
Value Iteration-Based Cooperative Adaptive Optimal Control for Multi-Player Differential Games With Incomplete Information 被引量:1
2
作者 Yun Zhang Lulu Zhang Yunze Cai 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第3期690-697,共8页
This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the l... This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples. 展开更多
关键词 Adaptive dynamic programming incomplete information multi-player differential game value iteration
在线阅读 下载PDF
Accelerated Value Iteration for Nonlinear Zero-Sum Games with Convergence Guarantee
3
作者 Yuan Wang Mingming Zhao +1 位作者 Nan Liu Ding Wang 《Guidance, Navigation and Control》 2024年第1期121-148,共28页
In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of th... In this paper,an accelerated value iteration(VI)algorithm is established to solve the zero-sum game problem with convergence guarantee.First,inspired by the successive over relaxation theory,the convergence rate of the iterative value function sequence is accelerated significantly with the relaxation factor.Second,the convergence and monotonicity of the value function sequence are analyzed under different ranges of the relaxation factor.Third,two practical approaches,namely the integrated scheme and the relaxation function,are introduced into the accelerated VI algorithm to guarantee the convergence of the iterative value function sequence for zero-sum games.The integrated scheme consists of the accelerated stage and the convergence stage,and the relaxation function can adjust the value of the relaxation factor.Finally,including the autopilot controller,the fantastic performance of the accelerated VI algorithm is verified through two examples with practical physical backgrounds. 展开更多
关键词 Adaptive dynamic programming convergence rate value iteration zero-sum games
在线阅读 下载PDF
Dynamic value iteration networks for the planning of rapidly changing UAV swarms 被引量:3
4
作者 Wei LI Bowei YANG +1 位作者 Guanghua SONG Xiaohong JIANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2021年第5期687-696,共10页
In an unmanned aerial vehicle ad-hoc network(UANET),sparse and rapidly mobile unmanned aerial vehicles(UAVs)/nodes can dynamically change the UANET topology.This may lead to UANET service performance issues.In this st... In an unmanned aerial vehicle ad-hoc network(UANET),sparse and rapidly mobile unmanned aerial vehicles(UAVs)/nodes can dynamically change the UANET topology.This may lead to UANET service performance issues.In this study,for planning rapidly changing UAV swarms,we propose a dynamic value iteration network(DVIN)model trained using the episodic Q-learning method with the connection information of UANETs to generate a state value spread function,which enables UAVs/nodes to adapt to novel physical locations.We then evaluate the performance of the DVIN model and compare it with the non-dominated sorting genetic algorithm II and the exhaustive method.Simulation results demonstrate that the proposed model significantly reduces the decisionmaking time for UAV/node path planning with a high average success rate. 展开更多
关键词 Dynamic value iteration networks Episodic Q-learning Unmanned aerial vehicle(UAV)ad-hoc network Non-dominated sorting genetic algorithm II(NSGA-II) Path planning
原文传递
Terminating Cycles for Iterated Difference Values of Four—Digit Integers
5
《岳阳大学学报》 CAS 1995年第1期4-12,共9页
1. Introduction Since D. R. Kaprekar discoverd the interesting property of the number 6174 an interesting mathematical model has been developed:
关键词 Digit Integers Terminating Cycles for Iterated Difference values of Four
在线阅读 下载PDF
Adaptive Optimal Discrete-Time Output-Feedback Using an Internal Model Principle and Adaptive Dynamic Programming 被引量:1
6
作者 Zhongyang Wang Youqing Wang Zdzisław Kowalczuk 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第1期131-140,共10页
In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed metho... In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data. More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem. We then design an observer to reproduce the full state of the system by measuring the inputs and outputs. Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model. It is important that with this concept one does not need to solve the regulator equation. Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection. 展开更多
关键词 Adaptive dynamic programming(ADP) internal model principle(IMP) output feedback problem policy iteration(PI) value iteration(VI)
在线阅读 下载PDF
Safe Q-Learning for Data-Driven Nonlinear Optimal Control With Asymmetric State Constraints
7
作者 Mingming Zhao Ding Wang +1 位作者 Shijie Song Junfei Qiao 《IEEE/CAA Journal of Automatica Sinica》 CSCD 2024年第12期2408-2422,共15页
This article develops a novel data-driven safe Q-learning method to design the safe optimal controller which can guarantee constrained states of nonlinear systems always stay in the safe region while providing an opti... This article develops a novel data-driven safe Q-learning method to design the safe optimal controller which can guarantee constrained states of nonlinear systems always stay in the safe region while providing an optimal performance.First,we design an augmented utility function consisting of an adjustable positive definite control obstacle function and a quadratic form of the next state to ensure the safety and optimality.Second,by exploiting a pre-designed admissible policy for initialization,an off-policy stabilizing value iteration Q-learning(SVIQL)algorithm is presented to seek the safe optimal policy by using offline data within the safe region rather than the mathematical model.Third,the monotonicity,safety,and optimality of the SVIQL algorithm are theoretically proven.To obtain the initial admissible policy for SVIQL,an offline VIQL algorithm with zero initialization is constructed and a new admissibility criterion is established for immature iterative policies.Moreover,the critic and action networks with precise approximation ability are established to promote the operation of VIQL and SVIQL algorithms.Finally,three simulation experiments are conducted to demonstrate the virtue and superiority of the developed safe Q-learning method. 展开更多
关键词 Adaptive critic control adaptive dynamic programming(ADP) control barrier functions(CBF) stabilizing value iteration Q-learning(SVIQL) state constraints
在线阅读 下载PDF
Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control 被引量:9
8
作者 Mingming Ha Ding Wang Derong Liu 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第7期1262-1272,共11页
The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of t... The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases.In this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem.Unlike the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to zero.The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated.Finally,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches. 展开更多
关键词 Adaptive critic design adaptive dynamic programming(ADP) approximate dynamic programming discrete-time nonlinear systems reinforcement learning stability analysis tracking control value iteration(VI)
在线阅读 下载PDF
User Association and Power Allocation for UAV-Assisted Networks: A Distributed Reinforcement Learning Approach 被引量:6
9
作者 Xin Guan Yang Huang +1 位作者 Chao Dong Qihui Wu 《China Communications》 SCIE CSCD 2020年第12期110-122,共13页
Unmanned aerial vehicles(UAVs)can be employed as aerial base stations(BSs)due to their high mobility and flexible deployment.This paper focuses on a UAV-assisted wireless network,where users can be scheduled to get ac... Unmanned aerial vehicles(UAVs)can be employed as aerial base stations(BSs)due to their high mobility and flexible deployment.This paper focuses on a UAV-assisted wireless network,where users can be scheduled to get access to either an aerial BS or a terrestrial BS for uplink transmission.In contrast to state-of-the-art designs focusing on the instantaneous cost of the network,this paper aims at minimizing the long-term average transmit power consumed by the users by dynamically optimizing user association and power allocation in each time slot.Such a joint user association scheduling and power allocation problem can be formulated as a Markov decision process(MDP).Unfortunately,solving such an MDP problem with the conventional relative value iteration(RVI)can suffer from the curses of dimensionality,in the presence of a large number of users.As a countermeasure,we propose a distributed RVI algorithm to reduce the dimension of the MDP problem,such that the original problem can be decoupled into multiple solvable small-scale MDP problems.Simulation results reveal that the proposed algorithm can yield lower longterm average transmit power consumption than both the conventional RVI algorithm and a baseline algorithm with myopic policies. 展开更多
关键词 user association power allocation long-term average cost Markov decision process relative value iteration curse of dimensionality
在线阅读 下载PDF
Reinforcement learning-based scheduling of multi-battery energy storage system 被引量:2
10
作者 CHENG Guangran DONG Lu +1 位作者 YUAN Xin SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第1期117-128,共12页
In this paper, a reinforcement learning-based multibattery energy storage system(MBESS) scheduling policy is proposed to minimize the consumers ’ electricity cost. The MBESS scheduling problem is modeled as a Markov ... In this paper, a reinforcement learning-based multibattery energy storage system(MBESS) scheduling policy is proposed to minimize the consumers ’ electricity cost. The MBESS scheduling problem is modeled as a Markov decision process(MDP) with unknown transition probability. However, the optimal value function is time-dependent and difficult to obtain because of the periodicity of the electricity price and residential load. Therefore, a series of time-independent action-value functions are proposed to describe every period of a day. To approximate every action-value function, a corresponding critic network is established, which is cascaded with other critic networks according to the time sequence. Then, the continuous management strategy is obtained from the related action network. Moreover, a two-stage learning protocol including offline and online learning stages is provided for detailed implementation in real-time battery management. Numerical experimental examples are given to demonstrate the effectiveness of the developed algorithm. 展开更多
关键词 multi-battery energy storage system(MBESS) reinforcement learning periodic value iteration DATA-DRIVEN
在线阅读 下载PDF
ITERATIVE POSITIVE SOLUTIONS FOR SINGULAR RIEMANN-STIELTJES INTEGRAL BOUNDARY VALUE PROBLEM
11
作者 Xiuli Lin Zengqin Zhao 《Annals of Applied Mathematics》 2016年第2期133-140,共8页
By applying iterative technique,we obtain the existence of positive solutions for a singular Riemann-Stieltjes integral boundary value problem in the case that f(t,u) is non-increasing respect to u.
关键词 Riemann-Stieltjes integral boundary value problems positive solution non-increasing iterative technique
原文传递
THE SYMMETRIC POSITIVE SOLUTIONS OF 2n-ORDER BOUNDARY VALUE PROBLEMS ON TIME SCALES
12
作者 Yangyang Yu Linlin Wang Yonghong Fan 《Annals of Applied Mathematics》 2016年第3期311-321,共11页
In this paper, we are concerned with the symmetric positive solutions of a 2n-order boundary value problems on time scales. By using induction principle,the symmetric form of the Green's function is established. In o... In this paper, we are concerned with the symmetric positive solutions of a 2n-order boundary value problems on time scales. By using induction principle,the symmetric form of the Green's function is established. In order to construct a necessary and sufficient condition for the existence result, the method of iterative technique will be used. As an application, an example is given to illustrate our main result. 展开更多
关键词 symmetric positive solutions boundary value problems induction principle time scales iterative technique
原文传递
Multilevel Techniques for the Solution of HJB Minimum-Time Control Problems
13
作者 CIARAMELLA Gabriele FABRINI Giulia 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2021年第6期2069-2091,共23页
The solution of minimum-time feedback optimal control problems is generally achieved using the dynamic programming approach,in which the value function must be computed on numerical grids with a very large number of p... The solution of minimum-time feedback optimal control problems is generally achieved using the dynamic programming approach,in which the value function must be computed on numerical grids with a very large number of points.Classical numerical strategies,such as value iteration(VI)or policy iteration(PI)methods,become very inefficient if the number of grid points is large.This is a strong limitation to their use in real-world applications.To address this problem,the authors present a novel multilevel framework,where classical VI and PI are embedded in a full-approximation storage(FAS)scheme.In fact,the authors will show that VI and PI have excellent smoothing properties,a fact that makes them very suitable for use in multilevel frameworks.Moreover,a new smoother is developed by accelerating VI using Anderson’s extrapolation technique.The effectiveness of our new scheme is demonstrated by several numerical experiments. 展开更多
关键词 Anderson acceleration FAS Hamilton-Jacobi equation minimum-time problem multi-level acceleration methods policy iteration value iteration
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部