无界报酬折扣半马氏决策模型矩最优策略的存在性

The Existence of a Moment Optimal Policy in Discounted Semi-Markov Decision Model with Unbounded Rewards

导出

摘要本文在矩最优准则下讨论具有可数状态空间和任意行动空间的Lippman型无界报酬折扣半马氏决策模型。对任意ε>0,证明了k阶矩ε-最优平稳策略的存在性,从而一般策略类中的矩最优性等价于平稳策略类中的矩最优性。(k-1)矩最优策略π为(k)矩最优的充要条件是(-1)^(k+1)V_k(π)满足最优方程,这里V_k(π)为使用π时的总折扣报酬的k阶矩。对平稳策略,给出了折扣报酬的各阶矩的递推公式,如果每个状态可用的行动集为有限集,证明了矩最优平稳策略的存在性,并建立了构造所有矩最优平稳策略的迭代算法。 This paper deals with discounted semi-Markov decision model with a countable state space, arbitrary action space and unbounded rewards under the criterion of moment optimality. The existence of stationary k-th moment ε-optimal policies is proved for every ε>0. By use of this result, it is shown that moment optimality among all policies is the same as moment optimality among all stationary polticies. A ( k-1) moment optimal policy π is also (k) moment optimal if and only if (-1) k+1Vk (π) satisfies optimal equation where Vk (π) is k-th moment of the total discounted rewards when π is used. The recursion formulae are presented for all moments of return for stationary policies. In the finite action case, the existence of stationary moment optimal policy is obtained and an iteration algorithm to construct all stationary moment optimal policies is developed.

作者伍从斌

机构地区云南大学计算机科学系

出处《云南大学学报（自然科学版）》 CAS CSCD 1991年第3期199-206,共8页 Journal of Yunnan University(Natural Sciences Edition)

关键词折扣模型无界报酬矩最优策略 discounted model, unbounded rewards, moments, optimal policy, sta- tionary policy

分类号 O212.5 [理学—概率论与数理统计]

引文网络
相关文献

参考文献3

1伍从斌.无界报酬折扣半马氏决策模型矩最优策略的结构[J].云南大学学报（自然科学版）,1990,12(4):299-306. 被引量：1
2郭世贞.折扣马氏决策规划的方差最小最优策略问题[J]应用数学学报,1987(02).
3董泽清.马氏决策规划的加速逼近算法与最小方差问题[J]数学学报,1978(02).

二级参考文献1

1郭世贞.折扣马氏决策规划的方差最小最优策略问题[J]应用数学学报,1987(02).

1伍从斌.无界报酬折扣半马氏决策模型矩最优策略的结构[J].云南大学学报（自然科学版）,1990,12(4):299-306. 被引量：1
2贾让成.字典序下的折扣多目标半马氏决策模型[J].西安电子科技大学学报,1989,16(2):55-63.
3胡奇英.非时齐无界报酬马氏决策规划[J].西安电子科技大学学报,1992,19(1):72-83.
4胡奇英.无界报酬折扣马氏决策规划中的逐次逼近法[J].数理统计与应用概率,1995,10(2):31-37.
5贾让成.折扣半马氏决策模型的矩最优问题[J].工程数学学报,1989,6(3):108-111.
6张升,张继红.无界报酬向量值折扣马氏决策规划[J].云南大学学报（自然科学版）,1993,15(3):200-207. 被引量：2
7胡奇英.状态部分可观察的无界报酬马氏决策规划[J].数理统计与应用概率,1998,13(3):79-86. 被引量：3
8邱德华.无界报酬非时齐折扣马氏决策模型[J].衡阳师专学报,1997,18(6):16-22.
9胡奇英.报酬无界的平均准则马氏决策过程(英文)[J].运筹学学报,2002,6(1):1-8.
10张昇,郭世贞.无界报酬非时齐折扣马氏决策模型[J].应用数学学报,1990,13(3):314-323. 被引量：2

云南大学学报（自然科学版）

1991年第3期

浏览历史

内容加载中请稍等...

无界报酬折扣半马氏决策模型矩最优策略的存在性

参考文献3

二级参考文献1

相关作者

相关机构

相关主题

浏览历史