A Supervised Learning Framework for Generic Object Detection in Images Saad Ali Computer Vision Lab University of Central Florida Orlando, FL, U.S.A sali@cs.ucf.edu Mubarak Shah Computer Vision Lab University of Central Florida Orlando, FL, U.S.A shah@cs.ucf.edu Abstract In recent years Kernel Principal Component Analysis (Ker- nel PCA) has gained much attention because of its ability to capture nonlinear image features, which are particularly important for encoding image structure. Boosting has been established as a powerful learning algorithm that can be used for feature selection. In this paper we present a novel framework for object class detection that combines the fea- ture reduction and feature selection abilities of Kernel PCA and AdaBoost respectively. The classiﬁer obtained in this way is able to handle change in object appearance, illu- mination conditions, and surrounding clutter. A nonlinear subspace is learned for positive and negative object classes using Kernel PCA. Features are derived by projecting exam- ple images onto the learned subspaces. Base learners are modeled using Bayes classiﬁer. AdaBoost is then employed to discover the features that are most relevant for the object detection task at hand. The proposed method has been suc- cessfully tested on wide range of object classes (cars, air- planes, pedestrians, motorcycles, etc) using standard data sets and has shown remarkable performance. Using a small training set, a classiﬁer learned in this way was able to gen- eralize the intra-class variation while still maintaining high detection rate. In most object categories we achieved de- tection rates of above 95% with minimal false alarm rates. We demonstrate the effectiveness of our approach in terms of absolute performance parameters and comparative per- formance against current state of the art approaches. 1. Introduction Detection and classiﬁcation of the object of interest in an unconstrained environment is a challenging problem. Ob- jects can occur under different visual appearances, poses, lighting conditions, backgrounds and clutter (Fig. 1). In addition to dealing with these intra-class variations, a suc- cessful object detector needs to tackle diverse imagery that exists in different applications. Automated object detec- tion has a wide range of applications such as surveillance, military target recognition, content based image retrieval, robotics, image mining, etc. Hence, there is a pressing need for a methodology which can carry out automatic object de- tection and indexing across wide range of imagery. Traditional methods for visual classiﬁcation involve two steps. First, features are extracted from the image and the object of interest is represented using those features. In the second step a classiﬁer is learned using the cho- sen feature representation. Popular classiﬁers employed for this task include Support Vector Machines, Perceptron, Winnow, Bayes Classiﬁer, Fisher Linear Discriminant, etc. These are termed as hyperplane classiﬁers, which work un- der the assumption that all features of the data are useful for classiﬁcation and that the data is linearly separable (or by linear combination of hyper-planes). Unfortunately, the images of objects such as cars, persons, airplanes, faces, etc, taken under different photometric and geometric con- ditions results in a highly nonlinear and non-convex feature space. Imaging process and low level image features such as gray levels, color or texture, that are derived from im- ages acquired through this process, are nonlinear functions of various factors. Therefore a simple linear separation of class and non-class images in feature space is not optimal [17, 12]. However, most of the current approaches use color, texture, orientation or blob features and try to learn a linear classiﬁer using them. Others try to compute simi- larity measures (L 1 or L 2 norm) between these high dimen- sional features to return the relevant object. But in high di- mensions, data becomes very sparse and distance measures Figure 1: Examples of variation among object categories (Air- plane and Cars) in terms of appearance, illumination condition, and background. 1 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE