摘要
论文讨论折扣因子依赖于(i,a)的非时齐马氏决策规划,提出了两组报酬无界的假设,证明了时齐正、负动态规划中的结论在这里都成立。最后讨论了此模型的最优策略性质。
The paper discusses the non-stationary Markov decision programming (abrev. MDP) with the discounted factor being of the form β_n(i, a). We present two assumptions about the unbounded rewardes and the similar results as in the stationary positive and negative dy- namic programming are all true here. Finally, we investigate the properties of the optimal policies.
出处
《西安电子科技大学学报》
EI
CAS
CSCD
北大核心
1992年第1期72-83,共12页
Journal of Xidian University
关键词
马氏决策规划
无界报酬
最优策略
non-stationary MDP
unbounded rewardes
properties of the optimal policies