Multi-class Object Layout with Unsupervised Image Classification and Object Localization Ser-Nam Lim 1 , Gianfranco Doretto 2 , and Jens Rittscher 1 1 Computer Vision Lab, GE Global Research, Niskayuna, NY 12309 2 Dept. of CS & EE, West Virginia University, Morgantown, WV 26506 Abstract. Recognizing the presence of object classes in an image, or image clas- sification, has become an increasingly important topic of interest. Equally impor- tant, however, is also the capability to locate these object classes in the image. We consider in this paper an approach to these two related problems with the primary goal of minimizing the training requirements so as to allow for ease of adding new object classes, as opposed to approaches that favor training a suite of object-specific classifiers. To this end, we provide the analysis of an exemplar- based approach that leverages unsupervised clustering for classification purpose, and sliding window matching for localization. While such exemplar based ap- proach by itself is brittle towards intraclass and viewpoint variations, we achieve robustness by introducing a novel Conditional Random Field model that facili- tates a straightforward accept/reject decision of the localized object classes. Per- formance of our approach on the PASCAL Visual Object Challenge 2007 dataset demonstrates its efficacy. 1 Introduction In recent years, the integration of the tasks of image classification and object localiza- tion has generated a great amount of interest [1,2,3,4,5,6,7,8,9,10,11]. The first task is concerned with labeling an image with tags describing the object classes depicted in it. The second task is concerned with localizing, typically by a bounding box, the objects described by such tags in the image. The rationale for combining these two tasks is that solving the first one would improve the solution to the second one and vice versa [3]. Additionally, it is of great common interest to not only know the presence of certain object classes in the image, but also their locations. Despite all the progress in image classification [12,13,14] and object detection [15,16,17,18,19], localizing and tagging objects in images are still challenging due to large intraclass variations, illumination and viewpoint changes, partial occlusions, as well as deformations. In light of these challenges, the sliding window approach (e.g., [3]) has been shown to be one of the more promising approaches towards solving such a multi-class object layout problem. The sliding window approach entails the design and use of a suite of trained binary clas- sifiers, each for classifying a specific object class, follow by applying these classifiers to the image with a sliding window approach. Training a suite of object-specific classifiers is, however, a tedious process that typ- ically requires many training images per class and a lengthy optimization procedure. For this reason also, adding new object class becomes non-trivial. To this end, the G. Bebis et al. (Eds.): ISVC 2011, Part I, LNCS 6938, pp. 577–589, 2011. c Springer-Verlag Berlin Heidelberg 2011