Task-Driven Discretization of the Joint Space of Visual Percepts and Continuous Actions S´ ebastien Jodogne and Justus H. Piater University of Li` ege — Montefiore Institute (B28) B-4000 Li` ege, Belgium {S.Jodogne,Justus.Piater}@ULg.ac.be Abstract. We target the problem of closed-loop learning of control policies that map visual percepts to continuous actions. Our algorithm, called Reinforcement Learning of Joint Classes (RLJC), adaptively dis- cretizes the joint space of visual percepts and continuous actions. In a sequence of attempts to remove perceptual aliasing, it incrementally builds a decision tree that applies tests either in the input perceptual space or in the output action space. The leaves of such a decision tree induce a piecewise constant, optimal state-action value function, which is computed through a reinforcement learning algorithm that uses the tree as a function approximator. The optimal policy is then derived by select- ing the action that, given a percept, leads to the leaf that maximizes the value function. Our approach is quite general and applies also to learning mappings from continuous percepts to continuous actions. A simulated visual navigation problem illustrates the applicability of RLJC. 1 Introduction Reinforcement Learning (RL) [1, 2] is an attractive framework for the automatic design of robotic controllers. RL algorithms are indeed able to learn direct map- pings from percepts to actions given a set of interactions of the robotic agent with its environment. These algorithms build on a careful analysis of a so-called reinforcement signal that implicitly defines the task to be solved. Using RL po- tentially simplifies the design process, as real-world robotic applications are in general difficult to model and to solve directly in a programming language. Unfortunately, although robotic controllers often interact with their environ- ment through a set of continuously-valued actions (position, velocity, torque,. . . ), relatively little consideration has been given to the development of RL algo- rithms that learn direct mappings from percepts to continuous actions. This is in contrast to continuous perceptual spaces, for which many solutions exist. The challenge of continuous actions spaces arises from the fact that standard update rules based upon Bellman’s optimality equations are only applicable on finite sets of actions, as they rely on a maximization over the action space. Fur- thermore, an a priori discretization of the action space generally suffers from an explosion of the representational size of the domains known as the curse of dimensionality , and may introduce artificial noise.