State Estimation in Contact-Rich Manipulation Florian Wirnshofer 1 , Philipp S. Schmitt 1 , Philine Meister 1 , Georg v. Wichert 1 and Wolfram Burgard 2 Abstract— This paper introduces a Bayesian state estimator for contact-rich manipulation tasks with application in non- prehensile manipulation, industrial assembly or in-hand local- ization. The core idea of our approach is to explicitly model both the contact dynamics and a torque-based robot controller as part of the underlying system model. Our approach is capable of estimating the state of movable objects for various robot kinematics and geometries of robots and objects. This includes complex scenarios with multiple robots, multiple objects and articulated objects. We have validated our approach in simula- tion and on a physical robot. The experiments show that multi- modal distributions of six degrees of freedom object poses can be accurately tracked in real-time in a complex manipulation scenario. I. INTRODUCTION Various automation tasks require robots that interact with objects in their environment. Tasks range from material handling and logistics to potential household applications in the future. In unstructured environments, robots must perceive the poses of manipulated objects. Perceiving objects with cameras is challenging as occlusions occur between robot and objects and among different objects. These occlu- sions happen at times where an accurate perception is most important: during interactions with objects, such as grasping or assembly. For this reason, it is important to estimate the state of manipulated objects based on sensor-feedback available in contact. A natural approach to estimation is to employ Bayesian reasoning. However, several aspects of contact-rich manipu- lation render this a challenging problem: 1) Variety of manipulation tasks: Various robot geome- tries and kinematics exist, such as multi-robot systems or dexterous multi-ﬁngered grippers. The same applies to geometry and number of objects. 2) Complex, dynamic interactions in contact: Predicting the interactions between multiple robots and objects, such as shown in Fig. 1, requires reasoning about a dynamical evolution of object states. To do so, the velocity of objects and forces stemming from gravity, contact and friction must be considered. 3) High-dimensional distributions: Estimating the state of motion of manipulated objects leads to inherently high-dimensional distributions. Each object has at least 12 degrees-of-freedom (DoF), six for its pose and six for its velocity. To the best of our knowledge, there currently exists no approach that addresses all three of these aspects. The con- tribution of this paper is to propose a new probabilistic model 1 Siemens Corporate Technology, Otto-Hahn-Ring 6, Munich, Germany 2 Dep. of Computer Science, University of Freiburg, Freiburg, Germany I III II IV Fig. 1. Peg-in-hole assembly using two 7-DoF torque-controlled robots: the pose of a box with a hole has high uncertainty and renders a direct insertion of the peg impossible (I). We improve the pose estimate of the box through contact-seeking manipulator motions (II, III). The improved estimate allows for a successful peg insertion (IV). for state estimation in contact-rich manipulation. This model comprises the physical dynamics of contact as well as an explicit model of a compliant, trajectory-following controller for the robot. We propose a particle ﬁlter-based approach to sequentially estimate the probability distribution of object states. The combination of particle representation together with an integrated model of a compliant controller and contact dynamics makes it possible to estimate a multi-modal distribution over object poses in real-time. We experimentally validated our approach for a dual-robot scenario and an in- hand localization scenario. Note that we assume the object geometries given. We do not use any feedback other than joint positions. II. RELATED WORK Tracking object poses during manipulation constitutes a demanding yet relevant challenge that has attracted substan- tial research interest throughout the past decades. Jia et al. [1] achieve object tracking during planar grasping using state observers known from deterministic control theory. Our work covers non-prehensile manipulation tasks, where large and non-linear uncertainties favor probabilistic approaches. Bayesian state estimation provides an alternative to clas- sical observer-based methods. In Bayesian estimation, the knowledge about a state is represented as a probability density conditioned on the available data [2]. Since the exact computation of the belief is intractable, assumptions must be made with regard to the representation of the involved probability distributions. Lowrey et al. [3] and Pfanne et al. [4] make use of Gaussian belief approximations in order