Online Context-based Object Recognition for Mobile Robots J.R. Ruiz-Sarmiento * , Martin G¨ unther † , Cipriano Galindo * , Javier Gonz´ alez-Jim´ enez * and Joachim Hertzberg †‡ * Dept. of System Eng. and Automation, Instituto de Investigaci´ on Biom´ edica de M´ alaga (IBIMA), University of M´ alaga, Spain {jotaraul,cgalindo,javiergonzalez}@uma.es † DFKI Robotics Innovation Center, Osnabr¨ uck Branch, 49076 Osnabr¨ uck, Germany {martin.guenther, joachim.hertzberg}@dfki.de ‡ Institute of Computer Science, Osnabr¨ uck University, Albrechtstr. 28, 49076 Osnabr¨ uck, Germany Abstract—This work proposes a robotic object recognition system that takes advantage of the contextual information latent in human-like environments in an online fashion. To fully leverage context, it is needed perceptual information from (at least) a portion of the scene containing the objects of interest, which could not be entirely covered by just an one-shot sensor observation. Information from a larger portion of the scenario could still be considered by progressively registering observations, but this approach experiences difficulties under some circumstances, e.g. limited and heavily demanded computational resources, dynamic environments, etc. Instead of this, the proposed recognition system relies on an anchoring process for the fast registration and propagation of objects’ features and locations beyond the current sensor frustum. In this way, the system builds a graph- based world model containing the objects in the scenario (both in the current and previously perceived shots), which is exploited by a Probabilistic Graphical Model (PGM) in order to leverage contextual information during recognition. We also propose a novel way to include the outcome of local object recognition methods in the PGM, which results in a decrease in the usually high CRF learning complexity. A demonstration of our proposal has been conducted employing a dataset captured by a mobile robot from restaurant-like settings, showing promising results. I. I NTRODUCTION Nowadays, object recognition systems tend to incorporate contextual information between objects, which has proven to increase the performance of local object recognition methods, i.e., those that only rely on features of the objects themselves (such as their geometry or appearance), and neglect the intrin- sic relations among objects in the scene [1]. Let’s consider a classic scenario from the Artificial Intelligence and Robotics fields consisting of a waiter robot checking the tables’ config- uration in a restaurant. This contextual information can guide the recognition process stating that a long, thin object to the left of a plate is more probable to be a fork than a spoon, since that is the common, preferable configuration. A large, growing body of literature has resorted to the Probabilistic Graphical Models (PGMs) framework [2] for modeling contextual relations [3–10]. In this framework, a set of weights are learned in a supervised training process [11], and then exploited by probabilistic inference to categorize and recognize sensory data. Applied to object recognition, training weights are associated with the different object classes (e.g., Fig. 1: Robot observing a partial view of a tabletop. Notice that only a part of the table is captured, but the world state model used by the object anchoring process still retains previously observed objects and relations. mug, vase, milk-pot, etc.), and the features used to character- ize them (e.g., color, height, size, etc.) and their contextual relations (e.g., distance between two objects, relative position with regard to a supporting surface, etc.). Most works address the problem through one-shot recog- nition systems [3], [4], [9], [10], which recognize objects relying on single observations of the scene (in the form of RGB, depth or RGB-D images). Regarding the exploitation of contextual information, one-shot systems are seriously limited by the sensor frustum and possible occlusions, given that they are able to observe only a portion of the objects and relations appearing in the inspected scene. Some approaches cope with this issue by registering a number of observations prior to the recognition process in order to obtain a wider view of the scene [5–10]. However, the time and computational resources needed for gathering and registering such obser- vations prevents their use in most robotic applications. Less attention has been paid to online recognition methods, which can mitigate these drawbacks by incorporating and exploiting objects and contextual relations not appearing in the current sensor observations, but previously perceived by the robot. Draft Version. Final version published in the 17th International Conference on Autonomous Robot Systems and Competitions (ICARSC)