Robotics: Science and Systems 2019 Freiburg im Breisgau, June 22-26, 2019 Conditional Neural Movement Primitives M. Yunus Seker * , Mert Imre * , Justus Piater † and Emre Ugur * * Computer Engineering Department Bogazici University, Istanbul, Turkey Email: {yunus.seker1, mert.imre, emre.ugur}@boun.edu.tr † Department of Computer Science, Universit¨ at Innsbruck, Austria Email: justus.piater@uibk.ac.at Abstract—Conditional Neural Movement Primitives (CNMPs) is a learning from demonstration framework that is designed as a robotic movement learning and generation system built on top of a recent deep neural architecture, namely Conditional Neural Processes (CNPs). Based on CNPs, CNMPs extract the prior knowledge directly from the training data by sampling observa- tions from it, and uses it to predict a conditional distribution over any other target points. CNMPs specifically learns complex temporal multi-modal sensorimotor relations in connection with external parameters and goals; produces movement trajectories in joint or task space; and executes these trajectories through a high-level feedback control loop. Conditioned with an external goal that is encoded in the sensorimotor space of the robot, predicted sensorimotor trajectory that is expected to be observed during the successful execution of the task is generated by the CNMP, and the corresponding motor commands are executed. In order to detect and react to unexpected events during action execution, CNMP is further conditioned with the actual sensor readings in each time-step. Through simulations and real robot experiments, we showed that CNMPs can learn the non- linear relations between low-dimensional parameter spaces and complex movement trajectories from few demonstrations; and they can also model the associations between high-dimensional sensorimotor spaces and complex motions using large number of demonstrations. The experiments further showed that even the task parameters were not explicitly provided to the system, the robot could learn their influence by associating the learned sen- sorimotor representations with the movement trajectories. The robot, for example, learned the influence of object weights and shapes through exploiting its sensorimotor space that includes proprioception and force measurements; and be able to change the movement trajectory on the fly when one of these factors were changed through external intervention. I. I NTRODUCTION Acquiring an advanced robotic skill set requires a robot to learn complex temporal multi-modal sensorimotor relations in connection with external parameters and goals. Learning from demonstration (LfD) framework [1] has been proposed in robotics as an efficient and intuitive way to teach such skills to the robots, in which the robot observes, learns and repro- duces the observed demonstrations. During teaching the skills, how the task is influenced by different factors is generally obvious to humans, yet this knowledge is mostly hidden in the experienced sensorimotor data of the robot and difficult to extract autonomously. Development of feature extraction and learning methods that are sufficiently general and flexible for a broad range of robotic tasks still stands as an important challenge in robotics. In order to deal with large variety of tasks that are influenced by factors defined in different levels of abstractions, we require a single framework that can learn feature-movement and raw-data-movement associations from small and large number of demonstrations, respectively, automatically filtering out the irrelevant information. Another challenge in LfD is to deal with multiple trajec- tories: Multiple demonstrations might be required to teach a skill either because different action trajectories are required in different situations or simply because there are multiple means to achieve the same task even in the same settings. The robot, in return, needs to capture the important characteristics in these observations coming from several demonstrations, and be able to reproduce the learned skill in new configurations reacting to unexpected events, such as external perturbations, on the fly. For this, the robot needs to develop an understanding on if and how the sensorimotor data and externally provided parameters are related to each other and to the motor commands. While one or more of the above properties were addressed by the existing movement frameworks [14, 15, 6, 2, 16, 4, 13], none of these approaches can handle all these requirements in one framework. In this paper, we propose Conditional Neural Movement Primitives (CNMP), a robotic framework that is built on top a recent neural network architecture, namely conditional neural processes (CNP), to encode movement primitives with the identified functionalities. Given multiple demonstrations, CNMP encodes the statistical regularities by learning about the distributions of the observations from reasonably few input data. Conditioning mechanism is used to predict a sensorimotor trajectory given external goals and the current sensorimotor information. Conditioning can be applied over the learned sensorimotor distribution using any set of variables such as joint positions, visual features or haptic feedback at any time point. For example for a learned grasp skill, the system might be queried to predict and generate hand and finger motion trajectory with a conditioning on the color of the object, weight measured in the wrist joint, and target aperture width for the finger. Given low- or high-dimensional input, CNMP can utilize simple MLPs or convolution opera- tion, respectively, to encode the correlations. Such networks allow the system to automatically extract the features that are relevant to the task. Finally, the predicted trajectory is realized by generating the corresponding actuator commands. CNMP can also learn multiple modes of operations from multiple demonstrations of a movement primitive. Importantly, CNMP specifically produces movement trajectories in joint or task space, and generates the corresponding motor commands