Module Based Reinforcement Learning for a Real Robot Zsolt Kalmar, Csaba Szepesvari and Andras Lorincz Department Of Adaptive Systems Research Group on Artiicial Intelligence JATE, Szeged, Hungary May 8, 1997 Abstract The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, inite state- and action-space, discrete-time controlled Markov-chains. Robot-leaning domains, on the other hand, are inherently ininite both in time and space, and moreover they are only partially observable. Previous ways of over coming these diiculties included state-space discretization based on a fixed set of featl'es and the use of behaviours 0' function approx imators to work with continuous spaces, but these approaches were limited both in their scope and practical utility. In this article we suggest a systematic method whieh uniies earlier solutions. The mo tivation of our method comes from the desire to transform the task to-be-solved into a inite-state, discrete-time, "approximately" Marko vian task, which is completely observable too. The key idea is to 1