J. Intelligent Learning Systems & Applications, 2010, 2: 69-79 doi:10.4236/jilsa.2010.22010 Published Online May 2010 (http://www.SciRP.org/journal/jilsa) Copyright © 2010 SciRes. JILSA 69 Relational Reinforcement Learning with Continuous Actions by Combining Behavioural Cloning and Locally Weighted Regression Julio H. Zaragoza, Eduardo F. Morales National Institute of Astrophysics, Optics and Electronics, Computer Science Department, Tonantzintla, México. Email: {jzaragoza, emorales}@inaoep.mx Received October 30 th , 2009; revised January 10 th , 2010; accepted January 30 th , 2010. ABSTRACT Reinforcement Learning is a commonly used technique for learning tasks in robotics, however, traditional algorithms are unable to handle large amounts of data coming from the robot’s sensors, require long training times, and use dis- crete actions. This work introduces TS-RRLCA, a two stage method to tackle these problems. In the first stage, low-level data coming from the robot’s sensors is transformed into a more natural, relational representation based on rooms, walls, corners, doors and obstacles, significantly reducing the state space. We use this representation along with Be- havioural Cloning, i.e., traces provided by the user; to learn, in few iterations, a relational control policy with discrete actions which can be re-used in different environments. In the second stage, we use Locally Weighted Regression to transform the initial policy into a continuous actions policy. We tested our approach in simulation and with a real ser- vice robot on different environments for different navigation and following tasks. Results show how the policies can be used on different domains and perform smoother, faster and shorter paths than the original discrete actions policies. Keywords: Relational Reinforcement Learning, Behavioural Cloning, Continuous Actions, Robotics 1. Introduction Nowadays it is possible to find service robots for many different tasks like entertainment, assistance, maintenance, cleanse, transport, guidance, etc. Due to the wide range of services that they provide, the incorporation of service robots in places like houses and offices has increased in recent years. Their complete incorporation and accep- tance, however, will depend on their capability to learn new tasks. Unfortunately, programming service robots for learning new tasks is a complex, specialized and time consuming process. An alternative and more attractive approach is to show the robot how to perform a task, rather than trying to program it, and let the robot to learn the fine details of how to perform the task. This is the approach that we follow on this paper. Reinforcement Learning (RL) [1] has been widely used and suggested as a good candidate for learning tasks in robotics, e.g., [2-9]. This is mainly because it allows an agent, i.e., the robot, to “autonomously” develop a con- trol policy for performing a new task while interacting with its environment. The robot only needs to know the goal of the task, i.e., the final state, and a set of possible actions associated with each state. The use and application of traditional RL techniques however, has been hampered by four main aspects: 1) vast amount of data produced by the robot’s sensors, 2) large search spaces, 3) the use of discrete actions, and 4) the inability to re-use previously learned policies in new, although related, tasks. Robots are normally equipped with laser range sensors, rings of sonars, cameras, etc., all of which produce a large number of readings at high sample rates creating problems to many machine learning algorithms. Large search spaces, on the other hand, produce very long training times which is a problem for service robots where the state space is continuous and a description of a state may involve several variables. Researchers have proposed different strategies to deal with continuous state and action spaces, normally based on a discretization of the state space with discrete actions or with function ap- proximation techniques. However, discrete actions pro- duce unnatural movements and slow paths for a robot and function approximation techniques tend to be com-