Movement generation by learning from demonstration and generalizing to new targets Peter Pastor, Heiko Hoﬀmann, and Stefan Schaal University of Southern California, Los Angeles, USA We provide a general approach for learning robotic movements from human demonstration. To represent a recorded movement, a non-linear diﬀerential equation is adapted such that it reproduces this movement. Based on this representation, we build a library of movements by labeling each recorded movement according to task and context (e.g., grasping, placing, and releasing). Our dif- ferential equation is formed such that generalization can be achieved simply by adapting a start and a goal pa- rameter in the equation to the desired position values of a movement. The feasibility of our approach is demon- strated with the Sarcos slave robot arm; the robot pours water into several cups after we demonstrated the move- ment for one cup. Humanoid robots assisting humans can become wide- spread only if they are easy to program. Easy program- ming might be achieved through learning from demonstra- tion [1]. A human movement is recorded and later repro- duced by a robot. Three challenges need to be mastered for this imitation: the correspondence problem, general- ization, and robustness against perturbation. The correspondence problem means that links and joints between human and robot may not match. Gen- eralization is required because we cannot demonstrate ev- ery single movement that the robot is supposed to make. Learning by demonstration is feasible only if a demon- strated movement can be generalized to other contexts, like diﬀerent goal positions. Finally, we need robust- ness against perturbation. Replaying exactly an observed movement is unrealistic in a dynamic environment, in which obstacles may appear suddenly. To address these issues, we present a model that is based on the dynamic movement primitive (DMP) frame- work [2, 3]. In this framework, any recorded movement can be represented with a set of diﬀerential equations. Rep- resenting a movement with a diﬀerential equation has the advantage that a perturbance can be automatically cor- rected for by the dynamics of the system; this behavior addresses the above mentioned robustness. Furthermore, the equations are formulated in a way that adaptation to a new goal is achieved by simply changing a goal param- eter. This characteristic allows generalization. Here, we will present a new version of the dynamic equations with improved adaptation to goal changes. In the present work, we use the dynamic movement primitives to represent a movement trajectory in end- eﬀector space; thus, we address the above-mentioned cor- respondence problem. In our robot demonstration, we use standard inverse kinematics to map the end-eﬀector po- sition and gripper orientation onto the appropriate joint angles. To deal with complex motion, the above framework can be used to build a library of movement primitives out of which the complex motion can be composed by sequencing. For example, the library may contain a grasping, placing, and releasing motion. Each of these movements is recorded from a human demonstrator, rep- resented by a diﬀerential equation, and labeled accord- ingly. For example, to move an object on a table, a grasping-placing-releasing sequence is required, and the corresponding primitives are recalled from the library. Due to the generalization ability of each dynamic move- ment primitive, an object may be placed between two arbitrary positions on the table based solely on the three demonstrated movements. Dynamic movement primitives Dynamic movement primitives can be used to generate discrete and rhythmic movements [2, 3]. Here, we focus on discrete movements and present a new variant of the equations. A movement is generated by integrating the following set of diﬀerential equations (which we will refer to as ‘transformation system’): τ ˙ v = K(g - x) - Dv - K(g - x 0 )θ + Kf (θ) (1) τ ˙ x = v, (2) where x and v are position and velocity of the system; x 0 and g are the start and goal position; τ is a temporal scaling factor; K and D are constants; D is chosen such that the system is critically damped, and f is a non-linear function that can be adapted to allow the generation of arbitrary complex movements. Equation (1) is motivated from human behavioral data and leg force ﬁelds observed in frog after stimulating the spinal cord [4]. The non-linear function is deﬁned as f (θ)= ∑ i w i ψ i (θ) ∑ i ψ i (θ) θ, (3) where ψ i are Gaussian basis functions, ψ i (θ) = exp(-h i (θ - c i ) 2 ) with center c i and width h i , and w i are adjustable weights. The function f does not directly depend on time; instead, it depends on a phase variable θ, which goes from 1 towards 0 during a movement and is obtained by the equation τ ˙ θ = -αθ . (4) where α is a pre-deﬁned constant. To learn a movement from demonstration, ﬁrst, a movement x(t) is recorded and its derivatives v(t) and ˙ v(t) are computed for each time step t. Second, f (t) is computed based on (1). Third, (4) is integrated and θ(t) evaluated. Using these arrays, we ﬁnd the weights w i in