Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg ∗ Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600 Abstract This article describes the results on the existence of optimal and nearly optimal policies for Markov Decision Processes (MDPs) with total expected discounted rewards. The problem of optimization of total expected discounted rewards for MDPs is also known under the name of discounted dynamic programming. 1 Introduction Deterministic optimal policies always exist for discounted dynamic programming prob- lems with finite action sets. Such policies also exist when action sets satisfy certain compactness conditions, and transition probabilities and reward functions satisfy cer- tain continuity conditions. If either compactness or continuity conditions do not hold, deterministic ǫ-optimal policies exist for problems with countable state spaces. For problems with uncountable Borel state spaces, the results similar to the existence of deterministic ǫ-optimal policies hold, but in this more general case either the notion of ǫ-optimality should be replaced with the weaker notion of (p, ǫ)-optimality or a broader definition of a policy is required. Since the theory is simpler when the state space is countable, problems with countable and uncountable state spaces are considered sep- arately in this chapter. 2 Countable state space 2.1 Definitions Consider a Markov Decision Process (MDP) with the state space X , action space A, sets of actions A(x) available at states x ∈ X , transition probabilities p, and rewards r. * efeinberg@notes.cc.sunysb.edu 1