Robotics and Autonomous Systems 59 (2011) 910–922
Contents lists available at SciVerse ScienceDirect
Robotics and Autonomous Systems
journal homepage: www.elsevier.com/locate/robot
Learning to pour with a robot arm combining goal and shape learning for
dynamic movement primitives
Minija Tamosiunaite
a,b,∗
, Bojan Nemec
c
, Aleš Ude
c
, Florentin Wörgötter
a
a
University Göttingen, Institute for Physics 3 - Biophysics, Bernstein Center for Computational Neuroscience, Friedrich-Hund-Platz 1, 37077 Göttingen, Germany
b
Vytautas Magnus University, Department of Informatics, Vileikos 8, 44404 Kaunas, Lithuania
c
Jožef Stefan Institute, Department of Automatics, Biocybernetics, and Robotics, Jamova 39, 1000 Ljubljana, Slovenia
article info
Article history:
Received 1 July 2010
Received in revised form
30 June 2011
Accepted 4 July 2011
Available online 12 July 2011
Keywords:
Reinforcement learning
PI
2
-method
Natural actor critic
Value function approximation
Dynamic movement primitives
abstract
When describing robot motion with dynamic movement primitives (DMPs), goal (trajectory endpoint),
shape and temporal scaling parameters are used. In reinforcement learning with DMPs, usually goals and
temporal scaling parameters are predefined and only the weights for shaping a DMP are learned. Many
tasks, however, exist where the best goal position is not a priori known, requiring to learn it. Thus, here we
specifically address the question of how to simultaneously combine goal and shape parameter learning.
This is a difficult problem because learning of both parameters could easily interfere in a destructive way.
We apply value function approximation techniques for goal learning and direct policy search methods
for shape learning. Specifically, we use ‘‘policy improvement with path integrals’’ and ‘‘natural actor
critic’’ for the policy search. We solve a learning-to-pour-liquid task in simulations as well as using
a Pa10 robot arm. Results for learning from scratch, learning initialized by human demonstration, as
well as for modifying the tool for the learned DMPs are presented. We observe that the combination
of goal and shape learning is stable and robust within large parameter regimes. Learning converges
quickly even in the presence of disturbances, which makes this combined method suitable for robotic
applications.
© 2011 Elsevier B.V. All rights reserved.
1. Introduction
Dynamic movement primitives (DMPs) proposed by Ijspeert
et al. [1] have become one of the most widely used tools for the
generation of robot movements. Numerous applications can be
found in the literature [2–5]. The DMP formalism is employed for
describing goal-directed movements and includes second-order
dynamics toward an attractor point, called the goal point g of the
movement, as well as several adjustable parameters, which are
used to obtain the desired shape of the trajectory. In this study, we
will consider the questions of robot reinforcement learning using
dynamic movement primitives. Several efficient methods have
been proposed for DMP shape parameter learning. These include
the natural actor critic (NAC ,[3]), policy improvement with path
integrals (PI
2
,[5]), and policy learning by weighting explorations
with the returns, (PoWER, [4]). Using those methods, robots were
trained to acquire specific skills, for example jumping across a gap
∗
Corresponding author at: University Göttingen, Institute for Physics 3 -
Biophysics, Bernstein Center for Computational Neuroscience, Friedrich-Hund-
Platz 1, 37077 Göttingen, Germany. Tel.: +49 370 687 37788; fax: +49 0 551 397720.
E-mail addresses: m.tamosiunaite@if.vdu.lt (M. Tamosiunaite),
bojan.nemec@ijs.si (B. Nemec), ales.ude@ijs.si (A. Ude), worgott@physik3.gwdg.de
(F. Wörgötter).
by a robot dog [5], hitting a baseball with a robot arm [3], or playing
the ball-in-a-cup game using a humanoid robot [4].
Here we will consider the combination of DMP goal and shape
learning. DMP goal learning was not much considered in robot
experiments before [6,7] and the simultaneous combination of the
two learning regimes is novel. The reason for this is that in most
tasks considered so far the goal position is known well enough.
Thus, goal learning is not required. There are, however, many tasks
where this is not the case, which happen as soon as the goal has
a hard-to-predict effect on the outcome. One example, which is
also in the core of the current study, is pouring of a liquid. The
complex turbulent motion of the liquid taking place at the rim of
the container makes it very hard to predict at what position (=goal)
the container should be optimally placed for best pouring results.
The same is true for other dynamic tasks, like throwing objects to
hit a predefined target [7] or placing one object on top of the other
where the stable configuration of the two objects is not known
in advance. Similar problems will arise when working with tools,
e.g. hammering, where an arm would be stopped at some specific
position letting the weight of the hammer do the final hitting. Goal
and shape are also important when working with soft materials,
e.g. when putting a table cloth on the table, goal position as well as
shape of the swinging movement when unfolding the cloth will be
important.
0921-8890/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.robot.2011.07.004