Abstract

This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n approaches infinity for each initial state and each final reward vector. In addition, we obtain a characterization of the chain and periodicity structure of the set of one-step and J-step maximal gain policies. Finally, we discuss the asymptotic properties of the undiscounted value-iteration method.

Authors
Paul Schweitzer and Awi Federgruen
Format
Journal Article
Publication Date
Journal
Mathematics of Operations Research

Full Citation

Schweitzer, Paul and Awi Federgruen
. “The asymptotic behavior of undiscounted value iteration in Markov decision problems.”
Mathematics of Operations Research
vol.
2
, (November 01, 1977):
360
-
381
.