Learning Robot Grasping from 3-D Images with Markov Random Fields Abdeslam Boularias, Oliver Kroemer, Jan Peters Abstract— Learning to grasp novel objects is an essential skill for robots operating in unstructured environments. We therefore propose a probabilistic approach for learning to grasp. In particular, we learn a function that predicts the success probability of grasps performed on surface points of a given object. Our approach is based on Markov Random Fields (MRF), and motivated by the fact that points that are geometrically close to each other tend to have similar grasp success probabilities. The MRF approach is successfully tested in simulation, and on a real robot using 3-D scans of various types of objects. The empirical results show a significant improvement over methods that do not utilize the smoothness assumption and classify each point separately from the others. I. INTRODUCTION A wide range of domestic tasks for service robots are based on object manipulation. Examples include collecting objects, loading or unloading a dishwasher, and opening doors. These tasks require the robot to localize objects and to efficiently grasp them. Grasping is therefore one of the most fundamental problems in robotics. This problem is particularly challenging in unstructured environments, since even familiar types of objects contain a range of shapes and sizes. Despite these variations, humans can learn how to grasp objects from a small number of examples, and efficiently generalize the learned skills to grasping novel objects. In this paper, we propose a probabilistic framework for teaching an autonomous robot to grasp new objects. The robot is equipped with a 3-D vision system, such as a Kinect or a time-of-flight camera. Given the importance of grasping for robots, a variety of approaches have been proposed [1]. Until the last decade, most of these techniques relied on complete and accurate 3-D models of the objects, in order to apply analytical methods from mechanics [2]. Building accurate models for new objects is difficult and often requires laser scanning the objects. Additionally, surface properties, such as friction and compliance are essential for these approaches. However, these properties are not easy to measure, and are often modelled as being uniform for a whole object. An alternative approach is the use of statistical methods for learning to grasp. These methods have received increased attention in recent years [3]–[6]. For example, de Granville et al. [3] explored the problem of representing the orientation of Abdeslam Boularias, Oliver Kroemer and Jan Peters are with the Max-Planck Institute for Intelligent Systems in T¨ ubingen, Germany. Oliver Kroemer and Jan Peters are also affiliated with the Technische Universitaet Darmstadt, Intelligent Autonomous Systems Group, Darmstadt, Germany. {abdeslam.boularias,oliverkro,jan.peters}@tuebingen.mpg.de The authors contributed equally to this work. Fig. 1. Barrett hand equipped with a SwissRanger time-of-flight camera a hand as it approaches an object, and demonstrated the fea- sibility of extracting canonical grasps from a human demon- stration. Canonical grasps were represented using clustering based on mixture distributions. Another approach [6] consists of combining analytical and empirical methods by segment- ing an object into a set of superquadratics and then learning which ones are more suitable for grasping. Vision-based methods have also been widely explored. Earlier work on grasping using vision was based on modeling an object as a set of primitive shapes, such as spheres, cylinders, cones and boxes, and then using a set of rules for generating grasp positions and orientations [7]. Pelossof et al [8] used an SVM to learn a grasp quality measure, where the grasping parameters correspond to the degrees of freedom of a hand. Rao et al. [9] used 3-D scan data points of a given object in order to segment it, and then used a classifier to select only graspable segments based on their color and geometric features. Saxena et al. [4] also showed that machine learning methods can be successfully applied to grasp novel objects. More specifically, they use 2-D images of the same object taken from different angles and learn a logistic function that predicts the position of a good grasping point based on its visual features. A grasping point is defined as a small region on the surface of the object that a human, using a two-fingered pinch grasp, would choose to grasp it. Recently, Jiang et al. [10] showed how to learn a grasp rectangle from 3-D images and use it for estimating a full 7-dimensional gripper configuration. This grasp rectangle is defined by taking into account features of both the object and the gripper. Another vision-based approach [6] first segments the object using the gaussian curvature as an indicator of the separation points. The segments are approximated with superquadratic primitive shapes. A neural network is then trained to learn which segments can be used for grasping.