摘要
本文在矩最优准则下讨论具有可数状态空间和任意行动空间的Lippman型无界报酬折扣半马氏决策模型。对任意ε>0,证明了k阶矩ε-最优平稳策略的存在性,从而一般策略类中的矩最优性等价于平稳策略类中的矩最优性。(k-1)矩最优策略π为(k)矩最优的充要条件是(-1)^(k+1)V_k(π)满足最优方程,这里V_k(π)为使用π时的总折扣报酬的k阶矩。对平稳策略,给出了折扣报酬的各阶矩的递推公式,如果每个状态可用的行动集为有限集,证明了矩最优平稳策略的存在性,并建立了构造所有矩最优平稳策略的迭代算法。
This paper deals with discounted semi-Markov decision model with a countable state space, arbitrary action space and unbounded rewards under the criterion of moment optimality. The existence of stationary k-th moment ε-optimal policies is proved for every ε>0. By use of this result, it is shown that moment optimality among all policies is the same as moment optimality among all stationary polticies. A ( k-1) moment optimal policy π is also (k) moment optimal if and only if (-1) k+1Vk (π) satisfies optimal equation where Vk (π) is k-th moment of the total discounted rewards when π is used. The recursion formulae are presented for all moments of return for stationary policies. In the finite action case, the existence of stationary moment optimal policy is obtained and an iteration algorithm to construct all stationary moment optimal policies is developed.
出处
《云南大学学报(自然科学版)》
CAS
CSCD
1991年第3期199-206,共8页
Journal of Yunnan University(Natural Sciences Edition)
关键词
折扣模型
无界报酬
矩
最优策略
discounted model, unbounded rewards, moments, optimal policy, sta- tionary policy