A Supervised Learning Framework for Generic Object Detection in Images
Saad Ali
Computer Vision Lab
University of Central Florida
Orlando, FL, U.S.A
sali@cs.ucf.edu
Mubarak Shah
Computer Vision Lab
University of Central Florida
Orlando, FL, U.S.A
shah@cs.ucf.edu
Abstract
In recent years Kernel Principal Component Analysis (Ker-
nel PCA) has gained much attention because of its ability
to capture nonlinear image features, which are particularly
important for encoding image structure. Boosting has been
established as a powerful learning algorithm that can be
used for feature selection. In this paper we present a novel
framework for object class detection that combines the fea-
ture reduction and feature selection abilities of Kernel PCA
and AdaBoost respectively. The classifier obtained in this
way is able to handle change in object appearance, illu-
mination conditions, and surrounding clutter. A nonlinear
subspace is learned for positive and negative object classes
using Kernel PCA. Features are derived by projecting exam-
ple images onto the learned subspaces. Base learners are
modeled using Bayes classifier. AdaBoost is then employed
to discover the features that are most relevant for the object
detection task at hand. The proposed method has been suc-
cessfully tested on wide range of object classes (cars, air-
planes, pedestrians, motorcycles, etc) using standard data
sets and has shown remarkable performance. Using a small
training set, a classifier learned in this way was able to gen-
eralize the intra-class variation while still maintaining high
detection rate. In most object categories we achieved de-
tection rates of above 95% with minimal false alarm rates.
We demonstrate the effectiveness of our approach in terms
of absolute performance parameters and comparative per-
formance against current state of the art approaches.
1. Introduction
Detection and classification of the object of interest in an
unconstrained environment is a challenging problem. Ob-
jects can occur under different visual appearances, poses,
lighting conditions, backgrounds and clutter (Fig. 1). In
addition to dealing with these intra-class variations, a suc-
cessful object detector needs to tackle diverse imagery that
exists in different applications. Automated object detec-
tion has a wide range of applications such as surveillance,
military target recognition, content based image retrieval,
robotics, image mining, etc. Hence, there is a pressing need
for a methodology which can carry out automatic object de-
tection and indexing across wide range of imagery.
Traditional methods for visual classification involve two
steps. First, features are extracted from the image and
the object of interest is represented using those features.
In the second step a classifier is learned using the cho-
sen feature representation. Popular classifiers employed
for this task include Support Vector Machines, Perceptron,
Winnow, Bayes Classifier, Fisher Linear Discriminant, etc.
These are termed as hyperplane classifiers, which work un-
der the assumption that all features of the data are useful
for classification and that the data is linearly separable (or
by linear combination of hyper-planes). Unfortunately, the
images of objects such as cars, persons, airplanes, faces,
etc, taken under different photometric and geometric con-
ditions results in a highly nonlinear and non-convex feature
space. Imaging process and low level image features such
as gray levels, color or texture, that are derived from im-
ages acquired through this process, are nonlinear functions
of various factors. Therefore a simple linear separation of
class and non-class images in feature space is not optimal
[17, 12]. However, most of the current approaches use
color, texture, orientation or blob features and try to learn
a linear classifier using them. Others try to compute simi-
larity measures (L
1
or L
2
norm) between these high dimen-
sional features to return the relevant object. But in high di-
mensions, data becomes very sparse and distance measures
Figure 1: Examples of variation among object categories (Air-
plane and Cars) in terms of appearance, illumination condition,
and background.
1
Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05)
1550-5499/05 $20.00 © 2005 IEEE