期刊文献+

无界报酬折扣半马氏决策模型矩最优策略的存在性

The Existence of a Moment Optimal Policy in Discounted Semi-Markov Decision Model with Unbounded Rewards
原文传递
导出
摘要 本文在矩最优准则下讨论具有可数状态空间和任意行动空间的Lippman型无界报酬折扣半马氏决策模型。对任意ε>0,证明了k阶矩ε-最优平稳策略的存在性,从而一般策略类中的矩最优性等价于平稳策略类中的矩最优性。(k-1)矩最优策略π为(k)矩最优的充要条件是(-1)^(k+1)V_k(π)满足最优方程,这里V_k(π)为使用π时的总折扣报酬的k阶矩。对平稳策略,给出了折扣报酬的各阶矩的递推公式,如果每个状态可用的行动集为有限集,证明了矩最优平稳策略的存在性,并建立了构造所有矩最优平稳策略的迭代算法。 This paper deals with discounted semi-Markov decision model with a countable state space, arbitrary action space and unbounded rewards under the criterion of moment optimality. The existence of stationary k-th moment ε-optimal policies is proved for every ε>0. By use of this result, it is shown that moment optimality among all policies is the same as moment optimality among all stationary polticies. A ( k-1) moment optimal policy π is also (k) moment optimal if and only if (-1) k+1Vk (π) satisfies optimal equation where Vk (π) is k-th moment of the total discounted rewards when π is used. The recursion formulae are presented for all moments of return for stationary policies. In the finite action case, the existence of stationary moment optimal policy is obtained and an iteration algorithm to construct all stationary moment optimal policies is developed.
作者 伍从斌
出处 《云南大学学报(自然科学版)》 CAS CSCD 1991年第3期199-206,共8页 Journal of Yunnan University(Natural Sciences Edition)
关键词 折扣模型 无界报酬 最优策略 discounted model, unbounded rewards, moments, optimal policy, sta- tionary policy
  • 相关文献

参考文献3

二级参考文献1

  • 1郭世贞.折扣马氏决策规划的方差最小最优策略问题[J]应用数学学报,1987(02).

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部