586 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 4, APRIL 2008 Shared Feature Extraction for Nearest Neighbor Face Recognition David Masip and Jordi Vitrià Abstract—In this paper, we propose a new supervised linear feature extraction technique for multiclass classiﬁcation problems that is specially suited to the nearest neighbor classiﬁer (NN). The problem of ﬁnding the optimal linear projection matrix is deﬁned as a classiﬁcation problem and the Adaboost algorithm is used to compute it in an iterative way. This strategy allows the introduction of a multitask learning (MTL) criterion in the method and results in a solution that makes no assumptions about the data distribution and that is specially appropriated to solve the small sample size problem. The performance of the method is illustrated by an application to the face recognition problem. The experiments show that the representation obtained following the multitask approach improves the classic feature extraction algorithms when using the NN classiﬁer, especially when we have a few examples from each class. Index Terms—Face recognition, feature extraction, multitask learning (MTL), nearest neighbor classiﬁcation (NN), small sample size problem. I. INTRODUCTION T HE integration of computers in our everyday life is in- creasing every day as technology evolves, making feasible new applications deal with automatic face classiﬁcation prob- lems; among them we can ﬁnd face recognition applied to se- curity, biometrics, and design of more user-friendly interfaces. Face images are captured as high-dimensional feature vectors, being usually a necessary dimensionality reduction process. According to the literature, the dimensionality reduction techniques can be classiﬁed in two categories: feature selection and feature extraction. In the feature selection [1]–[4] approach, only a subset from the original feature vector is preserved. In feature extraction, the original features are combined or trans- formed into the new extracted features. In this paper, we will deal with linear feature extraction methods, considering the feature selection problem as a special case of feature extraction where the selected features have coefﬁcient 1 in the projection matrix , and 0 in the other features. Classiﬁcation algorithms receive as an input a set of training samples each one represented as a feature vector. In the statis- tical pattern recognition literature, we can ﬁnd two important reasons to reduce the number of features [5]: 1) alleviate the Manuscript received July 25, 2006; revised May 11, 2007; accepted September 4, 2007. This work was supported by the Ministerio de Educación y Ciencia (MEC) under Grant TIN2006-15308-C02-01, Spain. D. Masip is with the Universitat Oberta de Catalunya, Barcelona 08018, Spain (e-mail: dmasipr@uoc.edu; davidm@maia.ub.es). J. Vitrià is with the Computer Vision Center, Computer Science Depart- ment, Universitat Autònoma de Barcelona, Barcelona 08193, Spain (e-mail: jordi@cvc.uab.es). Digital Object Identiﬁer 10.1109/TNN.2007.911742 Fig. 1. Example of classiﬁcation accuracy as a function of the number of training samples using the NN classiﬁer and some of the state-of-the-art feature extraction algorithms. The ARFace database has been used for this experiment. curse of dimensionality problem, improving the parameter esti- mation of the classiﬁer [6] and 2) reduce the storage and com- putational needs. These advantages are crucial in face recogni- tion applications, where the feature extraction process can miti- gate the effect of the noise present in natural images, and also in order to ﬁnd invariant characteristics of the individual, making the later classiﬁcation step more robust to changes in illumina- tion or partial occlusions. Typically, in face recognition problems, the number of images from each class is considerably limited: only one or two faces can be acquired from each person. Under this assumption, most of the traditional feature extraction methods suffer a signiﬁcant performance drop. Fig. 1 shows an example of the evolution of the classiﬁcation accuracy when applying different feature extraction algorithms with different number of training samples. This phenomenon is known as the small sample size problem. In addition, in face recognition problems, the dimensionality of the input space is rather large. In this context, many statis- tical classiﬁers fail in the density estimation task, making the nonparametric classiﬁers a good alternative. In this paper, we focus on the nearest neighbor classiﬁcation rule (NN), which is one of the most used in spite of its simplicity. Assuming the NN classiﬁer, the main motivation of this work is to provide a multiclass feature extraction method that performs no speciﬁc distribution assumptions on the data, and that can improve the results of the NN classiﬁcation even when only a few samples per class are available. The ﬁrst contribution of this paper is to propose a new itera- tive feature extraction method for multiclass classiﬁcation under 1045-9227/$25.00 © 2008 IEEE