586 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 4, APRIL 2008
Shared Feature Extraction for Nearest
Neighbor Face Recognition
David Masip and Jordi Vitrià
Abstract—In this paper, we propose a new supervised linear
feature extraction technique for multiclass classification problems
that is specially suited to the nearest neighbor classifier (NN).
The problem of finding the optimal linear projection matrix is
defined as a classification problem and the Adaboost algorithm
is used to compute it in an iterative way. This strategy allows
the introduction of a multitask learning (MTL) criterion in the
method and results in a solution that makes no assumptions about
the data distribution and that is specially appropriated to solve
the small sample size problem. The performance of the method
is illustrated by an application to the face recognition problem.
The experiments show that the representation obtained following
the multitask approach improves the classic feature extraction
algorithms when using the NN classifier, especially when we have
a few examples from each class.
Index Terms—Face recognition, feature extraction, multitask
learning (MTL), nearest neighbor classification (NN), small
sample size problem.
I. INTRODUCTION
T
HE integration of computers in our everyday life is in-
creasing every day as technology evolves, making feasible
new applications deal with automatic face classification prob-
lems; among them we can find face recognition applied to se-
curity, biometrics, and design of more user-friendly interfaces.
Face images are captured as high-dimensional feature vectors,
being usually a necessary dimensionality reduction process.
According to the literature, the dimensionality reduction
techniques can be classified in two categories: feature selection
and feature extraction. In the feature selection [1]–[4] approach,
only a subset from the original feature vector is preserved. In
feature extraction, the original features are combined or trans-
formed into the new extracted features. In this paper, we will
deal with linear feature extraction methods, considering the
feature selection problem as a special case of feature extraction
where the selected features have coefficient 1 in the projection
matrix , and 0 in the other features.
Classification algorithms receive as an input a set of training
samples each one represented as a feature vector. In the statis-
tical pattern recognition literature, we can find two important
reasons to reduce the number of features [5]: 1) alleviate the
Manuscript received July 25, 2006; revised May 11, 2007; accepted
September 4, 2007. This work was supported by the Ministerio de Educación y
Ciencia (MEC) under Grant TIN2006-15308-C02-01, Spain.
D. Masip is with the Universitat Oberta de Catalunya, Barcelona 08018, Spain
(e-mail: dmasipr@uoc.edu; davidm@maia.ub.es).
J. Vitrià is with the Computer Vision Center, Computer Science Depart-
ment, Universitat Autònoma de Barcelona, Barcelona 08193, Spain (e-mail:
jordi@cvc.uab.es).
Digital Object Identifier 10.1109/TNN.2007.911742
Fig. 1. Example of classification accuracy as a function of the number of
training samples using the NN classifier and some of the state-of-the-art feature
extraction algorithms. The ARFace database has been used for this experiment.
curse of dimensionality problem, improving the parameter esti-
mation of the classifier [6] and 2) reduce the storage and com-
putational needs. These advantages are crucial in face recogni-
tion applications, where the feature extraction process can miti-
gate the effect of the noise present in natural images, and also in
order to find invariant characteristics of the individual, making
the later classification step more robust to changes in illumina-
tion or partial occlusions.
Typically, in face recognition problems, the number of images
from each class is considerably limited: only one or two faces
can be acquired from each person. Under this assumption, most
of the traditional feature extraction methods suffer a significant
performance drop. Fig. 1 shows an example of the evolution
of the classification accuracy when applying different feature
extraction algorithms with different number of training samples.
This phenomenon is known as the small sample size problem.
In addition, in face recognition problems, the dimensionality
of the input space is rather large. In this context, many statis-
tical classifiers fail in the density estimation task, making the
nonparametric classifiers a good alternative. In this paper, we
focus on the nearest neighbor classification rule (NN), which
is one of the most used in spite of its simplicity. Assuming the
NN classifier, the main motivation of this work is to provide a
multiclass feature extraction method that performs no specific
distribution assumptions on the data, and that can improve the
results of the NN classification even when only a few samples
per class are available.
The first contribution of this paper is to propose a new itera-
tive feature extraction method for multiclass classification under
1045-9227/$25.00 © 2008 IEEE