The double-factored decision theory for Markov decision processes with multiple scenarios of the parameters is proposed in this article.We introduce scenario belief to describe the probability distribution of scenario...The double-factored decision theory for Markov decision processes with multiple scenarios of the parameters is proposed in this article.We introduce scenario belief to describe the probability distribution of scenarios in the system,and scenario expectation to formulate the expected total discounted reward of a policy.We establish a new framework named as double-factored Markov decision process(DFMDP),in which the physical state and scenario belief are shown to be the double factors serving as the sufficient statistics for the history of the decision process.Four classes of policies for the finite horizon DFMDPs are studied and it is shown that there exists a double-factored Markovian deterministic policy which is optimal among all policies.We also formulate the infinite horizon DFMDPs and present its optimality equation in this paper.An exact solution method named as double-factored backward induction for the finite horizon DFMDPs is proposed.It is utilized to find the optimal policies for the numeric examples and then compared with policies derived from other methods from the related literatures.展开更多
基金supported by the(United States)National Science Foundation(No.1409214)。
文摘The double-factored decision theory for Markov decision processes with multiple scenarios of the parameters is proposed in this article.We introduce scenario belief to describe the probability distribution of scenarios in the system,and scenario expectation to formulate the expected total discounted reward of a policy.We establish a new framework named as double-factored Markov decision process(DFMDP),in which the physical state and scenario belief are shown to be the double factors serving as the sufficient statistics for the history of the decision process.Four classes of policies for the finite horizon DFMDPs are studied and it is shown that there exists a double-factored Markovian deterministic policy which is optimal among all policies.We also formulate the infinite horizon DFMDPs and present its optimality equation in this paper.An exact solution method named as double-factored backward induction for the finite horizon DFMDPs is proposed.It is utilized to find the optimal policies for the numeric examples and then compared with policies derived from other methods from the related literatures.