Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we pr...Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we propose the Dyna actiondependent heuristic dynamic programming(Dyna-ADHDP)method, which incorporates the ideas of learning and planning from the Dyna framework in action-dependent heuristic dynamic programming. This method defines a continuous action space for precise control of an energy storage system and allows online optimization of algorithm performance during the real-time operation of the residential energy model. Meanwhile, the target network is introduced during the training process to make the training smoother and more efficient. We conducted experimental comparisons with the benchmark method using simulated and real data to verify its applicability and performance. The results confirm the method's excellent performance and generalization capabilities, as well as its excellence in increasing renewable energy utilization and extending equipment life.展开更多
In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others...In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others'system parameters or control laws.Each player adopts an on-policy value iteration algorithm as the basic learning framework.To deal with the incomplete information structure,players collect a period of system trajectory data to compensate for the lack of information.The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy.Theoretical analysis shows that by adopting proximal policy searching rules,the approximated policies can converge to a neighborhood of equilibrium policies.The efficacy of our method is illustrated by three examples,which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.展开更多
In this paper,a novel adaptive Fault-Tolerant Control(FTC)strategy is proposed for non-minimum phase Hypersonic Vehicles(HSVs)that are affected by actuator faults and parameter uncertainties.The strategy is based on t...In this paper,a novel adaptive Fault-Tolerant Control(FTC)strategy is proposed for non-minimum phase Hypersonic Vehicles(HSVs)that are affected by actuator faults and parameter uncertainties.The strategy is based on the output redefinition method and Adaptive Dynamic Programming(ADP).The intelligent FTC scheme consists of two main parts:a basic fault-tolerant and stable controller and an ADP-based supplementary controller.In the basic FTC part,an output redefinition approach is designed to make zero-dynamics stable with respect to the new output.Then,Ideal Internal Dynamic(IID)is obtained using an optimal bounded inversion approach,and a tracking controller is designed for the new output to realize output tracking of the nonminimum phase HSV system.For the ADP-based compensation control part,an ActionDependent Heuristic Dynamic Programming(ADHDP)adopting an actor-critic learning structure is utilized to further optimize the tracking performance of the HSV control system.Finally,simulation results are provided to verify the effectiveness and efficiency of the proposed FTC algorithm.展开更多
Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed metho...In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data. More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem. We then design an observer to reproduce the full state of the system by measuring the inputs and outputs. Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model. It is important that with this concept one does not need to solve the regulator equation. Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection.展开更多
The use of dynamic programming(DP)algorithms to learn Bayesian network structures is limited by their high space complexity and difficulty in learning the structure of large-scale networks.Therefore,this study propose...The use of dynamic programming(DP)algorithms to learn Bayesian network structures is limited by their high space complexity and difficulty in learning the structure of large-scale networks.Therefore,this study proposes a DP algorithm based on node block sequence constraints.The proposed algorithm constrains the traversal process of the parent graph by using the M-sequence matrix to considerably reduce the time consumption and space complexity by pruning the traversal process of the order graph using the node block sequence.Experimental results show that compared with existing DP algorithms,the proposed algorithm can obtain learning results more efficiently with less than 1%loss of accuracy,and can be used for learning larger-scale networks.展开更多
An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their...An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimiza-tion of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamicprogramming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performanceerror bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, anumerical example is provided.展开更多
The paper develops a robust control approach for nonaffine nonlinear continuous systems with input constraints and unknown uncertainties. Firstly, this paper constructs an affine augmented system(AAS) within a pre-com...The paper develops a robust control approach for nonaffine nonlinear continuous systems with input constraints and unknown uncertainties. Firstly, this paper constructs an affine augmented system(AAS) within a pre-compensation technique for converting the original nonaffine dynamics into affine dynamics. Secondly, the paper derives a stability criterion linking the original nonaffine system and the auxiliary system, demonstrating that the obtained optimal policies from the auxiliary system can achieve the robust controller of the nonaffine system. Thirdly, an online adaptive dynamic programming(ADP) algorithm is designed for approximating the optimal solution of the Hamilton–Jacobi–Bellman(HJB) equation.Moreover, the gradient descent approach and projection approach are employed for updating the actor-critic neural network(NN) weights, with the algorithm's convergence being proven. Then, the uniformly ultimately bounded stability of state is guaranteed. Finally, in simulation, some examples are offered for validating the effectiveness of this presented approach.展开更多
This paper deals with reduction of losses in electric power distribution system through a dynamic reconfiguration case study of a grid in the city of Mostar,Bosnia and Herzegovina.The proposed solution is based on a n...This paper deals with reduction of losses in electric power distribution system through a dynamic reconfiguration case study of a grid in the city of Mostar,Bosnia and Herzegovina.The proposed solution is based on a nonlinear model predictive control algorithm which determines the optimal switching operations of the distribution system.The goal of the control algorithm is to find the optimal radial network topology which minimizes cumulative active power losses and maximizes voltages across the network while simultaneously satisfying all system constraints.The optimization results are validated through multiple simulations(using real power demand data collected for a few characteristic days during winter and summer)which demonstrate the efficiency and usefulness of the developed control algorithm in reducing the grid losses by up to 14%.展开更多
Investors are always willing to receive more data.This has become especially true for the application of modern portfolio theory to the institutional asset allocation process,which requires quantitative estimates of r...Investors are always willing to receive more data.This has become especially true for the application of modern portfolio theory to the institutional asset allocation process,which requires quantitative estimates of risk and return.When long-term data series are unavailable for analysis,it has become common practice to use recent data only.The danger is that these data may not be representative of future performance.Although longer data series are of poorer quality,are difficult to obtain,and may reflect various political and economic regimes,they often paint a very different picture of emerging market performance.This paper presents an application of a stochastic non-linear optimization model of portfolios including transaction costs in the Brazilian financial market.In order to have that,portfolio theory and optimal control were used as theoretical basis.The first strategy tries to allocate the whole available wealth,not considering the risk associated to portfolio(deterministic result).In this case the investor obtained profits of 7.23%a month,taking into account the three risk aversion levels during the whole planning period.On the contrary,the results from the stochastic algorithm obtain profits of 1.34%a month and 18.06%a year,if the investor has low risk aversion.The profits would be 0.88%a month and 11.02%a year for a medium risk aversion investor.And with high risk aversion,the investor obtains 0.62%a month and 7.68%a year.展开更多
From a perspective of theoretical study, there are some faults in the models of the existing object-oriented programming languages. For example, C# does not support metaclasses, the primitive types of Java and C# are ...From a perspective of theoretical study, there are some faults in the models of the existing object-oriented programming languages. For example, C# does not support metaclasses, the primitive types of Java and C# are not objects, etc. So, this paper designs a programming language, Shrek, which integrates many language features and constructions in a compact and consistent model. The Shrek language is a class-based purely object-oriented language. It has a dynamical strong type system, and adopts a single-inheritance mechanism with Mixin as its complement. It has a consistent class instantiation and inheritance structure, and the ability of intercessive structural computational reflection, which enables it to support safe metaclass programming. It also supports multi-thread programming and automatic garbage collection, and enforces its expressive power by adopting a native method mechanism. The prototype system of the Shrek language is implemented and anticipated design goals are achieved.展开更多
Unmanned aerial vehicles(UAVs) may play an important role in data collection and offloading in vast areas deploying wireless sensor networks, and the UAV’s action strategy has a vital influence on achieving applicabi...Unmanned aerial vehicles(UAVs) may play an important role in data collection and offloading in vast areas deploying wireless sensor networks, and the UAV’s action strategy has a vital influence on achieving applicability and computational complexity. Dynamic programming(DP) has a good application in the path planning of UAV, but there are problems in the applicability of special terrain environment and the complexity of the algorithm.Based on the analysis of DP, this paper proposes a hierarchical directional DP(DDP) algorithm based on direction determination and hierarchical model. We compare our methods with Q-learning and DP algorithm by experiments, and the results show that our method can improve the terrain applicability, meanwhile greatly reduce the computational complexity.展开更多
A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource...A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.展开更多
基金supported in part by the National Key Research and Development Program of China(2024YFB4709100,2021YFE0206100)the National Natural Science Foundation of China(62073321)+1 种基金the National Defense Basic Scientific Research Program(JCKY2019203C029)the Science and Technology Development Fund,Macao SAR,China(0015/2020/AMJ)
文摘Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we propose the Dyna actiondependent heuristic dynamic programming(Dyna-ADHDP)method, which incorporates the ideas of learning and planning from the Dyna framework in action-dependent heuristic dynamic programming. This method defines a continuous action space for precise control of an energy storage system and allows online optimization of algorithm performance during the real-time operation of the residential energy model. Meanwhile, the target network is introduced during the training process to make the training smoother and more efficient. We conducted experimental comparisons with the benchmark method using simulated and real data to verify its applicability and performance. The results confirm the method's excellent performance and generalization capabilities, as well as its excellence in increasing renewable energy utilization and extending equipment life.
基金supported by the Aeronautical Science Foundation of China(20220001057001)an Open Project of the National Key Laboratory of Air-based Information Perception and Fusion(202437)
文摘In this paper,a distributed adaptive dynamic programming(ADP)framework based on value iteration is proposed for multi-player differential games.In the game setting,players have no access to the information of others'system parameters or control laws.Each player adopts an on-policy value iteration algorithm as the basic learning framework.To deal with the incomplete information structure,players collect a period of system trajectory data to compensate for the lack of information.The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy.Theoretical analysis shows that by adopting proximal policy searching rules,the approximated policies can converge to a neighborhood of equilibrium policies.The efficacy of our method is illustrated by three examples,which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.
基金supported in part by the Science Center Program of National Natural Science Foundation of China(62373189,62188101,62020106003)the Research Fund of State Key Laboratory of Mechanics and Control for Aerospace Structures,China。
文摘In this paper,a novel adaptive Fault-Tolerant Control(FTC)strategy is proposed for non-minimum phase Hypersonic Vehicles(HSVs)that are affected by actuator faults and parameter uncertainties.The strategy is based on the output redefinition method and Adaptive Dynamic Programming(ADP).The intelligent FTC scheme consists of two main parts:a basic fault-tolerant and stable controller and an ADP-based supplementary controller.In the basic FTC part,an output redefinition approach is designed to make zero-dynamics stable with respect to the new output.Then,Ideal Internal Dynamic(IID)is obtained using an optimal bounded inversion approach,and a tracking controller is designed for the new output to realize output tracking of the nonminimum phase HSV system.For the ADP-based compensation control part,an ActionDependent Heuristic Dynamic Programming(ADHDP)adopting an actor-critic learning structure is utilized to further optimize the tracking performance of the HSV control system.Finally,simulation results are provided to verify the effectiveness and efficiency of the proposed FTC algorithm.
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
基金supported by the National Science Fund for Distinguished Young Scholars (62225303)the Fundamental Research Funds for the Central Universities (buctrc202201)+1 种基金China Scholarship Council,and High Performance Computing PlatformCollege of Information Science and Technology,Beijing University of Chemical Technology。
文摘In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data. More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem. We then design an observer to reproduce the full state of the system by measuring the inputs and outputs. Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model. It is important that with this concept one does not need to solve the regulator equation. Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection.
基金Shaanxi Science Fund for Distinguished Young Scholars,Grant/Award Number:2024JC-JCQN-57Xi’an Science and Technology Plan Project,Grant/Award Number:2023JH-QCYJQ-0086+2 种基金Scientific Research Program Funded by Education Department of Shaanxi Provincial Government,Grant/Award Number:P23JP071Engineering Technology Research Center of Shaanxi Province for Intelligent Testing and Reliability Evaluation of Electronic Equipments,Grant/Award Number:2023-ZC-GCZX-00472022 Shaanxi University Youth Innovation Team Project。
文摘The use of dynamic programming(DP)algorithms to learn Bayesian network structures is limited by their high space complexity and difficulty in learning the structure of large-scale networks.Therefore,this study proposes a DP algorithm based on node block sequence constraints.The proposed algorithm constrains the traversal process of the parent graph by using the M-sequence matrix to considerably reduce the time consumption and space complexity by pruning the traversal process of the order graph using the node block sequence.Experimental results show that compared with existing DP algorithms,the proposed algorithm can obtain learning results more efficiently with less than 1%loss of accuracy,and can be used for learning larger-scale networks.
文摘An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimiza-tion of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamicprogramming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performanceerror bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, anumerical example is provided.
基金Project supported by the National Natural Science Foundation of China (Grant No. 62103408)Beijing Nova Program (Grant No. 20240484516)the Fundamental Research Funds for the Central Universities (Grant No. KG16314701)。
文摘The paper develops a robust control approach for nonaffine nonlinear continuous systems with input constraints and unknown uncertainties. Firstly, this paper constructs an affine augmented system(AAS) within a pre-compensation technique for converting the original nonaffine dynamics into affine dynamics. Secondly, the paper derives a stability criterion linking the original nonaffine system and the auxiliary system, demonstrating that the obtained optimal policies from the auxiliary system can achieve the robust controller of the nonaffine system. Thirdly, an online adaptive dynamic programming(ADP) algorithm is designed for approximating the optimal solution of the Hamilton–Jacobi–Bellman(HJB) equation.Moreover, the gradient descent approach and projection approach are employed for updating the actor-critic neural network(NN) weights, with the algorithm's convergence being proven. Then, the uniformly ultimately bounded stability of state is guaranteed. Finally, in simulation, some examples are offered for validating the effectiveness of this presented approach.
基金supported in part by the European Regional Development Fund under Grant KK.01.1.1.01.0009(DATACROSS).
文摘This paper deals with reduction of losses in electric power distribution system through a dynamic reconfiguration case study of a grid in the city of Mostar,Bosnia and Herzegovina.The proposed solution is based on a nonlinear model predictive control algorithm which determines the optimal switching operations of the distribution system.The goal of the control algorithm is to find the optimal radial network topology which minimizes cumulative active power losses and maximizes voltages across the network while simultaneously satisfying all system constraints.The optimization results are validated through multiple simulations(using real power demand data collected for a few characteristic days during winter and summer)which demonstrate the efficiency and usefulness of the developed control algorithm in reducing the grid losses by up to 14%.
文摘Investors are always willing to receive more data.This has become especially true for the application of modern portfolio theory to the institutional asset allocation process,which requires quantitative estimates of risk and return.When long-term data series are unavailable for analysis,it has become common practice to use recent data only.The danger is that these data may not be representative of future performance.Although longer data series are of poorer quality,are difficult to obtain,and may reflect various political and economic regimes,they often paint a very different picture of emerging market performance.This paper presents an application of a stochastic non-linear optimization model of portfolios including transaction costs in the Brazilian financial market.In order to have that,portfolio theory and optimal control were used as theoretical basis.The first strategy tries to allocate the whole available wealth,not considering the risk associated to portfolio(deterministic result).In this case the investor obtained profits of 7.23%a month,taking into account the three risk aversion levels during the whole planning period.On the contrary,the results from the stochastic algorithm obtain profits of 1.34%a month and 18.06%a year,if the investor has low risk aversion.The profits would be 0.88%a month and 11.02%a year for a medium risk aversion investor.And with high risk aversion,the investor obtains 0.62%a month and 7.68%a year.
文摘为降低重型商用车燃油消耗、减少运输成本,本文协调“人-车-路”交互体系,将车辆与智能网联环境下的多维度信息进行融合,提出了一种基于迭代动态规划(iterative dynamic programming,IDP)的自适应距离域预见性巡航控制策略(adaptive range predictive cruise control strategy,ARPCC)。首先结合车辆状态与前方环境多维度信息,基于车辆纵向动力学建立自适应距离域模型对路网重构,简化网格数量并利用IDP求取全局最优速度序列。其次,在全局最优速度序列的基础上,求取自适应距离域内的分段最优速度序列,实现车辆控制状态的快速求解。最后,利用Matlab/Simulink进行验证。结果表明,通过多次迭代缩小网格,该算法有效提高了计算效率和车辆燃油经济性。
基金The National Science Fund for Distinguished Young Scholars (No.60425206)the National Natural Science Foundation of China (No.60633010)the Natural Science Foundation of Jiangsu Province(No.BK2006094)
文摘From a perspective of theoretical study, there are some faults in the models of the existing object-oriented programming languages. For example, C# does not support metaclasses, the primitive types of Java and C# are not objects, etc. So, this paper designs a programming language, Shrek, which integrates many language features and constructions in a compact and consistent model. The Shrek language is a class-based purely object-oriented language. It has a dynamical strong type system, and adopts a single-inheritance mechanism with Mixin as its complement. It has a consistent class instantiation and inheritance structure, and the ability of intercessive structural computational reflection, which enables it to support safe metaclass programming. It also supports multi-thread programming and automatic garbage collection, and enforces its expressive power by adopting a native method mechanism. The prototype system of the Shrek language is implemented and anticipated design goals are achieved.
基金supported by the National Natural Science Foundation of China(91648204 61601486)+1 种基金State Key Laboratory of High Performance Computing Project Fund(1502-02)Research Programs of National University of Defense Technology(ZDYYJCYJ140601)
文摘Unmanned aerial vehicles(UAVs) may play an important role in data collection and offloading in vast areas deploying wireless sensor networks, and the UAV’s action strategy has a vital influence on achieving applicability and computational complexity. Dynamic programming(DP) has a good application in the path planning of UAV, but there are problems in the applicability of special terrain environment and the complexity of the algorithm.Based on the analysis of DP, this paper proposes a hierarchical directional DP(DDP) algorithm based on direction determination and hierarchical model. We compare our methods with Q-learning and DP algorithm by experiments, and the results show that our method can improve the terrain applicability, meanwhile greatly reduce the computational complexity.
文摘A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.