J. Intelligent Learning Systems & Applications, 2010, 2: 69-79
doi:10.4236/jilsa.2010.22010 Published Online May 2010 (http://www.SciRP.org/journal/jilsa)
Copyright © 2010 SciRes. JILSA
69
Relational Reinforcement Learning with
Continuous Actions by Combining Behavioural
Cloning and Locally Weighted Regression
Julio H. Zaragoza, Eduardo F. Morales
National Institute of Astrophysics, Optics and Electronics, Computer Science Department, Tonantzintla, México.
Email: {jzaragoza, emorales}@inaoep.mx
Received October 30
th
, 2009; revised January 10
th
, 2010; accepted January 30
th
, 2010.
ABSTRACT
Reinforcement Learning is a commonly used technique for learning tasks in robotics, however, traditional algorithms
are unable to handle large amounts of data coming from the robot’s sensors, require long training times, and use dis-
crete actions. This work introduces TS-RRLCA, a two stage method to tackle these problems. In the first stage, low-level
data coming from the robot’s sensors is transformed into a more natural, relational representation based on rooms,
walls, corners, doors and obstacles, significantly reducing the state space. We use this representation along with Be-
havioural Cloning, i.e., traces provided by the user; to learn, in few iterations, a relational control policy with discrete
actions which can be re-used in different environments. In the second stage, we use Locally Weighted Regression to
transform the initial policy into a continuous actions policy. We tested our approach in simulation and with a real ser-
vice robot on different environments for different navigation and following tasks. Results show how the policies can be
used on different domains and perform smoother, faster and shorter paths than the original discrete actions policies.
Keywords: Relational Reinforcement Learning, Behavioural Cloning, Continuous Actions, Robotics
1. Introduction
Nowadays it is possible to find service robots for many
different tasks like entertainment, assistance, maintenance,
cleanse, transport, guidance, etc. Due to the wide range
of services that they provide, the incorporation of service
robots in places like houses and offices has increased in
recent years. Their complete incorporation and accep-
tance, however, will depend on their capability to learn
new tasks. Unfortunately, programming service robots for
learning new tasks is a complex, specialized and time
consuming process.
An alternative and more attractive approach is to show
the robot how to perform a task, rather than trying to
program it, and let the robot to learn the fine details of
how to perform the task. This is the approach that we
follow on this paper.
Reinforcement Learning (RL) [1] has been widely used
and suggested as a good candidate for learning tasks in
robotics, e.g., [2-9]. This is mainly because it allows an
agent, i.e., the robot, to “autonomously” develop a con-
trol policy for performing a new task while interacting
with its environment. The robot only needs to know the
goal of the task, i.e., the final state, and a set of possible
actions associated with each state.
The use and application of traditional RL techniques
however, has been hampered by four main aspects: 1)
vast amount of data produced by the robot’s sensors, 2)
large search spaces, 3) the use of discrete actions, and 4)
the inability to re-use previously learned policies in new,
although related, tasks.
Robots are normally equipped with laser range sensors,
rings of sonars, cameras, etc., all of which produce a
large number of readings at high sample rates creating
problems to many machine learning algorithms.
Large search spaces, on the other hand, produce very
long training times which is a problem for service robots
where the state space is continuous and a description of a
state may involve several variables. Researchers have
proposed different strategies to deal with continuous state
and action spaces, normally based on a discretization of
the state space with discrete actions or with function ap-
proximation techniques. However, discrete actions pro-
duce unnatural movements and slow paths for a robot
and function approximation techniques tend to be com-