3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/ ˜ vision Abstract In this paper, a novel object class detection method based on 3D object modeling is presented. Instead of using a complicated mechanism for relating multiple 2D training views, the proposed method establishes spatial connections between these views by mapping them directly to the surface of 3D model. The 3D shape of an object is reconstructed by using a homographic framework from a set of model views around the object and is represented by a volume consisting of binary slices. Features are computed in each 2D model view and mapped to the 3D shape model using the same ho- mographic framework. To generalize the model for object class detection, features from supplemental views are also considered. A codebook is constructed from all of these fea- tures and then a 3D feature model is built. Given a 2D test image, correspondences between the 3D feature model and the testing view are identified by matching the detected fea- tures. Based on the 3D locations of the corresponding fea- tures, several hypotheses of viewing planes can be made. The one with the highest confidence is then used to detect the object using feature location matching. Performance of the proposed method has been evaluated by using the PASCAL VOC challenge dataset and promising results are demonstrated. 1. Introduction In recent years, the problem of object detection has re- ceived considerable attention from both the computer vi- sion and machine learning communities. The key challenge of this problem is the ability to recognize any member in a category of objects in spite of wide variations in visual appearance due to geometrical transformations, change in viewpoint, or illumination. In this paper, a novel 3D feature model based object class detection method is proposed to deal with these challenges. The objective of this work is to detect the object given an arbitrary 2D view using a general 3D feature model of the class. In our work, the objects can be arbitrarily transformed (with translation and rotation), and the viewing position and orientation of the camera is arbitrary as well. In addition, camera parameters are assumed to be unknown. Object detection in such a setting has been considered a very challenging problem due to various difficulties of ge- ometrically modeling relevant 3D object shapes and the ef- fects of perspective projection. In this paper, we exploit a recently proposed 3D reconstruction method using ho- mographic framework for 3D object shape reconstruction. Given a set of 2D images of an object taken from differ- ent viewpoints around the object with unknown camera pa- rameters, which are called model views, the 3D shape of this specific object can be reconstructed using the homo- graphic framework proposed in [10]. In our work, 3D shape is represented by a volume consisting of binary slices with 1 denoting the object and 0 for background. By using this method, we can not only reconstruct 3D shapes for the ob- jects to be detected, but also have access to the homogra- phies between the 2D views and the 3D models, which are then used to build the 3D feature model for object class de- tection. In the feature modeling phase of the proposed method, SIFT features [12] are computed for each of the 2D model views and mapped to the surface of the 3D model. Since it is difficult to accurately relate 2D coordinates to a 3D model by projecting the 3D model to a 2D view (with unknown camera parameters), we propose to use a homography trans- formation based algorithm. Since the homographies have been obtained during the 3D shape reconstruction process, the projection of a 3D model can be easily computed by in- tegrating the transformations of slices from the model to a particular view, as opposed to directly projecting the entire model by estimation of the projection matrix. To generalize the model for object class detection, images of other objects of the class are used as supplemental views. Features from these views are mapped to the 3D model in the same way as for those model views. A codebook is constructed from all of these features and then a 3D feature model is built. The 3D feature model thus combines the 3D shape information and appearance features for robust object class detection.