Skill Learning and Task Outcome Prediction for Manipulation Peter Pastor 1 Mrinal Kalakrishnan 1 Sachin Chitta 2 Evangelos Theodorou 1 Stefan Schaal 1 Abstract— Learning complex motor skills for real world tasks is a hard problem in robotic manipulation that often requires painstaking manual tuning and design by a human expert. In this work, we present a Reinforcement Learning based approach to acquiring new motor skills from demonstration. Our approach allows the robot to learn fine manipulation skills and significantly improve its success rate and skill level starting from a possibly coarse demonstration. Our approach aims to incorporate task domain knowledge, where appropriate, by working in a space consistent with the constraints of a specific task. In addition, we also present an approach to using sensor feedback to learn a predictive model of the task outcome. This allows our system to learn the proprioceptive sensor feedback needed to monitor subsequent executions of the task online and abort execution in the event of predicted failure. We illustrate our approach using two example tasks executed with the PR2 dual-arm robot: a straight and accurate pool stroke and a box flipping task using two chopsticks as tools. I. I NTRODUCTION As robots move into the real world, they will have to acquire complex perceptuo-motor skills. Skill learning is, in general, a hard task for robots. A common approach to this problem has been to use learning from demonstration to teach the robot a new skill, e.g. as in teaching an industrial manipulator to perform a particular movement sequence. This approach often involves a human expert who can guide the robot to do the task correctly but must also manually tune the parameters of the task carefully to achieve accurate performance. This manual process is often laborious and significantly reduces the flexibility of the robot in learning a variety of tasks. Moreover, such teaching approaches often neglect the interplay of movement and perception. Conversely, learning entire tasks from scratch by exploring the complete state space is intractable due to the high dimensionality of modern mobile manipulation systems. In this paper, we present an approach to learning skills which bootstraps the skill from imitation learning, refines it with reinforcement learning, and finally learns to anticipate the quality of performance from the statistics of sensory data during the movement. An initial solution (or initial policy) is most often demonstrated by a human expert using an appropriate teleoperation framework or motion capture setup. This initial policy is often feasible but not very robust, i.e. its success rate on a series of repeated experiments will often be This research was conducted while Peter Pastor and Mrinal Kalakrishnan were interns at Willow Garage. This research was additionally supported in part by National Science Foundation grants ECS-0326095, IIS-0535282, IIS-1017134, CNS-0619937, IIS-0917318, CBET-0922784, EECS-0926052, CNS-0960061, the DARPA program on Advanced Robotic Manipulation, the Army Research Office, the Okawa Foundation, and the ATR Computa- tional Neuroscience Laboratories. Evangelos Theodorou was supported by a Myronis Fellowship. 1 Peter Pastor, Mrinal Kalakrishnan, Evangelos Theodorou, and Ste- fan Schaal are with the CLMC Laboratory, University of Southern California, Los Angeles, USA {pastorsa, kalakris, etheodor, sschaal}@usc.edu 2 Sachin Chitta is with Willow Garage Inc., Menlo Park, CA 94025, USA sachinc@willowgarage.com very low. Our approach uses a Reinforcement Learning (RL) approach to explore a region around this policy to obtain an optimal solution (final policy). Our approach is based on the PI 2 algorithm [1], which, due to its minimal number of manual tuning parameters, has been shown to work robustly and efficiently in contrast to previous RL approaches, even in very high dimensional motor systems. Fig. 1: The PR2 robot learning a pool stroke and manipulating a box using chopsticks. While learning a robust policy in the proposed manner will allow the robot to execute the desired skills, it is important to note that due to the uncertainty inherent in real-world tasks, the robot may still fail from time to time. Thus, the ability to predict the outcome of a task based on previous experience is very valuable, and in particular, to predict a negative outcome before the failure actually occurred. Humans seem to learn to predict sensory information for most of the tasks they perform on a daily basis. For example during walking, humans continuously predict the ground reaction forces that they expect to feel at the soles of their feet, enabling them to instantly detect and react to deviations that might arise due to changes in ground terrain (like stepping into a pothole). For robots, such prediction of performance and sensory events is either largely missing or, if it exists, is the result of the design of a human expert who may have to spend a lot of time manually tuning thresholds. In this paper, we introduce a method of using sensory statistics from multiple trials for a specific skill to predict the outcome of new executions of the same skill. We demonstrate the generality of our approach through an application to learning skills for two particular tasks using the PR2 mo- bile manipulation system. The first task involves learning a straight, accurate and fast pool shot. This behavior had been previously demonstrated on the PR2 [2] using a hand-tuned behavior that required additional hardware to be added to the robot. In contrast, starting from an initial demonstration, we learn a fast and accurate stroke motion for the robot gripping a pool stick in a more natural manner. The second task involves the use of additional tools in the form of a chopstick grasped in each hand to flip a box placed on a table. This task is a fine manipulation task requiring more dexterity than the pool task. We demonstrate how proprioceptive sensor information can be used to achieve a more gentle execution of this task while also providing a predicted outcome of the task during execution.