Value Iteration over Belief Subspace Weihong Zhang wzhang@cs.ust.hk Department of Computer Science Hong Kong University of Science & Technology Clear Water Bay, Kowloon, Hong Kong, China Abstract. Partially Observable Markov Decision Processes (POMDPs) provide an elegant framework for AI planning tasks with uncertainties. Value iteration is a well-known algorithm for solving POMDPs. It is notoriously difficult because at each step it needs to account for every belief state in a continuous space. In this paper, we show that value iteration can be conducted over a subset of belief space. Then, we study a class of POMDPs, namely informative POMDPs, where each observation provides good albeit incomplete information about world states. For informative POMDPs, value iteration can be conducted over a small subset of belief space. This yields two advantages: First, fewer vectors are in need to represent value functions. Second, value iteration can be accelerated. Empirical studies are presented to demonstrate these two advantages. 1 Introduction Partially Observable Markov Decision Processes (POMDPs) provide a general frame- work for AI planning problems where effects of actions are nondeterministic and the state of the world is not known with certainty. Unfortunately, solving general POMDPs is computationally intractable(e.g., [12]). Although much recent effort has been devoted to finding efficient algorithms for POMDPs, there is still a significant distance to solve realistic problems. Value iteration [13] is a standard algorithm for solving POMDP. It conducts a se- quence of dynamic programming (DP) updates to improve values for each belief state in belief space. Due to the fact that there are uncountably many belief states, DP updates and hence value iteration are computationally prohibitive in practice. In this paper, we propose to conduct DP updates and hence value iteration over a subset of belief space. The subset is referred to as belief subspace or simply subspace. It consists of all possible belief states the agent encounters. As value iteration is conducted over the subspace, each DP update accounts for only belief states in the subspace. The hope is that DP updates over a subset should be more efficient than those over the entire belief space. However, for general POMDPs, it is difficult to represent this subspace and perform implicit DP updates. Furthermore, occasionally the subspace could be as large as the original belief space. In this case, value iteration over subspace actually provides no benefits at all. We study a class of special POMDPs, namely informative POMDPs, where any observation can restrict the world into a small set of states. For informative POMDPs,