HEISELE, KIM, MEYER: OBJECT RECOGNITION WITH 3D MODELS 1 Object Recognition with 3D Models Bernd Heisele bheisele@honda-ri.com Honda Research Institute Cambridge, USA Gunhee Kim gunhee@cs.cmu.edu Carnegie Mellon University Pittsburgh, USA Andrew J. Meyer ajmeyer.mit.edu Massachusetts Institute of Technology Cambridge, USA Abstract We propose techniques for designing and training of pose-invariant object recognition systems using realistic 3d computer graphics models. We look at the relation between the size of the training set and the classiﬁcation accuracy for a basic recognition task and provide a method for estimating the degree of difﬁculty of detecting an object. We show how to sample, align, and cluster images of objects on the view sphere. We address the problem of training on large, highly redundant data and propose a novel active learning method which generates compact training sets and compact classiﬁers. 1 Introduction Over the past ﬁve years, the object recognition community has taken on the challenge of developing systems that learn to recognize hundreds of object classes from few examples per class. The standard data sets used for benchmarking [11, 12] these systems contained in average less than two hundred images per class as opposed to sets of thousands of meticu- lously segmented object images that were used in earlier work on object detection [21, 24]. Benchmarking on such small data sets is inherently problematic, the test results are not gen- eralizable and can be misleading; the reader is referred to [19] for a related discussion on database issues. There have been efforts of building larger databases of manually annotated, natural im- ages [10, 20, 25]. However, the somewhat arbitrary selection of images and the missing ground truth make it difﬁcult to systematically analyze speciﬁc properties of object recog- nition systems, such as invariance to pose, scale, position, and illumination. A database for shape-based object recognition which addresses these issues is NORB [16]. Pictures of objects were taken with consideration of viewpoint and illumination. The images were synthetically altered to add more image variations, such as object rotation, background, and distractors. Taking the idea of controlling the image generation one step further takes us to fully synthetic images rendered from realistic 3D computer graphics models. In [4, 8, 19] view- based object recognition systems have been trained and evaluated on synthetic images. The face recognition system in [14] and the object recognition system in [17] have been trained on views of 3D models and tested on real images. 3D models have also been used in a generative c  2009. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.