摘要
本文讨论状态可数的平均报酬马氏决策规划(Markov Decision Programming,简记为MDP),通过模型的转换,文中将半马氏决策规划和连续时间MDP分别转换为离散时间MDP.转换保持模型间的最优方程等价,后一转换甚至保持平均目标函数等价。因此,离散时间MDP申的大部分结论可轻易地推广到另两类MDP中去。最后本文讨论了π<sub>0</sub><sup>∞</sup>为最优策略与最优方程的关系。
This paper discusses average rewards Markov decision programming (abbrev. MDP). Through the transformation of models, the semi-Markov decision programming and the continuous time MDP are transformed to the discrete time MDP respectively, with the optimality equations kept equivalent, so that the most results in the discrete time MDP can be extended to the two other MDP models. Finally, the relationship between the optimal policy π_0~∞ and the optimality equation is discussed.
出处
《西安电子科技大学学报》
EI
CAS
CSCD
北大核心
1991年第1期63-71,共9页
Journal of Xidian University
关键词
模型转换
平均模型
MDP
模型
Markov decision programming
average rewards
transformations of models