Abstract
Most quantities of interest in discounted and undiscounted (semi-) Markov decision processes can be obtained by solving a system of functional equations. This paper derives bounds and variational characterizations for the solutions of such systems. These are useful for at least three reasons: (1) in any solution procedure the upper and lower bounds can be used to measure the deviation of the current solution from optimality; (2) this in turn may permit elimination of suboptimal actions; and (3) the variational characterizations suggest numerical algorithms (linear programming, policy iteration algorithms, successive approximation schemes).
Full Citation
Journal of Mathematical Analysis and Applications
vol.
117
,
(August 01, 1986):
326
-357
.