Robust model-based scene interpretation by multilayered context information Sungho Kim * , In So Kweon Department of EECS, Korea Advanced Institute of Science and Technology, 373-1 Gusong-dong, Yuseong-gu, Daejeon, Republic of Korea Received 26 December 2005; accepted 25 September 2006 Available online 6 December 2006 Abstract In this paper, we present a new graph-based frame work for collaborative place, object, and part recognition in indoor environments. We consider a scene to be an undirected graphical model composed of a place node, object nodes, and part nodes with undirected links. Our key contribution is the introduction of collaborative place and object recognition (we call it as the hierarchical context in this paper) instead of object only or causal relation of place to objects. We unify the hierarchical context and the well-known spatial context into a complete hierarchical graphical model (HGM). In the HGM, object and part nodes contain labels and related pose information instead of only a label for robust inference of objects. The most diﬃcult problems of the HGM are learning and inferring variable graph struc- tures. We learn the HGM in a piecewise manner instead of by joint graph learning for tractability. Since the inference includes variable structure estimation with marginal distribution of each node, we approximate the pseudo-likelihood of marginal distribution using mul- timodal sequential Monte Carlo with weights updated by belief propagation. Data-driven multimodal hypothesis and context-based pruning provide the correct inference. For successful recognition, issues related to 3D object recognition are also considered and several state-of-the-art methods are incorporated. The proposed system greatly reduces false alarms using the spatial and hierarchical contexts. We demonstrate the feasibility of the HGM-based collaborative place, object, and part recognition in actual large-scale environments for guidance applications (12 places, 112 3D objects). Ó 2006 Elsevier Inc. All rights reserved. Keywords: Hierarchical context; Spatial context; Collaborative; Place–object–part recognition; Piecewise learning; Multi-modal sequential Monte Carlo; Hierarchical graphical model 1. Introduction Consider visitors to a new environment. They have no prior information about the environment so they may need a guidance system to acquire the place and related object information. This paper is concerned with the problem of recognizing places and objects in real environments, as shown in Fig. 1. The scope of the place and object recogni- tion is to identify places and objects with parts in the form of place labels and object labels with poses. This can be regarded as scene interpretation at the semantic level. It is important in guidance applications to recognize places and objects in uncontrolled environments where the cam- era may move arbitrarily and light conditions also change. We are aware of two feasible approaches to place and object recognition. One baseline method regards these as separate problems. The other method directly relates places and objects using a Bayesian framework [1] (dotted arrow as shown in Fig. 2). Place is ﬁrst recognized using gist infor- mation from ﬁlter responses and then the place informa- tion provides the Bayesian probable prior distribution of object label and scale, position [1]. On the contrary, we interrelate place, object, and part recognition method using an undirected graphical model [3]. Place information can provide contextually related object priors, but conversely, ambiguous places can be discriminated by recognizing con- textually related objects. This is the key concept for collab- orative place and object recognition. Likewise, object 1077-3142/$ - see front matter Ó 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.cviu.2006.09.004 * Corresponding author. Fax: +82 42 869 5465. E-mail addresses: shkim@rcv.kaist.ac.kr (S. Kim), iskweon@kaist. ac.kr (I.S. Kweon). www.elsevier.com/locate/cviu Computer Vision and Image Understanding 105 (2007) 167–187