586 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 4, APRIL 2008 Shared Feature Extraction for Nearest Neighbor Face Recognition David Masip and Jordi Vitrià Abstract—In this paper, we propose a new supervised linear feature extraction technique for multiclass classification problems that is specially suited to the nearest neighbor classifier (NN). The problem of finding the optimal linear projection matrix is defined as a classification problem and the Adaboost algorithm is used to compute it in an iterative way. This strategy allows the introduction of a multitask learning (MTL) criterion in the method and results in a solution that makes no assumptions about the data distribution and that is specially appropriated to solve the small sample size problem. The performance of the method is illustrated by an application to the face recognition problem. The experiments show that the representation obtained following the multitask approach improves the classic feature extraction algorithms when using the NN classifier, especially when we have a few examples from each class. Index Terms—Face recognition, feature extraction, multitask learning (MTL), nearest neighbor classification (NN), small sample size problem. I. INTRODUCTION T HE integration of computers in our everyday life is in- creasing every day as technology evolves, making feasible new applications deal with automatic face classification prob- lems; among them we can find face recognition applied to se- curity, biometrics, and design of more user-friendly interfaces. Face images are captured as high-dimensional feature vectors, being usually a necessary dimensionality reduction process. According to the literature, the dimensionality reduction techniques can be classified in two categories: feature selection and feature extraction. In the feature selection [1]–[4] approach, only a subset from the original feature vector is preserved. In feature extraction, the original features are combined or trans- formed into the new extracted features. In this paper, we will deal with linear feature extraction methods, considering the feature selection problem as a special case of feature extraction where the selected features have coefficient 1 in the projection matrix , and 0 in the other features. Classification algorithms receive as an input a set of training samples each one represented as a feature vector. In the statis- tical pattern recognition literature, we can find two important reasons to reduce the number of features [5]: 1) alleviate the Manuscript received July 25, 2006; revised May 11, 2007; accepted September 4, 2007. This work was supported by the Ministerio de Educación y Ciencia (MEC) under Grant TIN2006-15308-C02-01, Spain. D. Masip is with the Universitat Oberta de Catalunya, Barcelona 08018, Spain (e-mail: dmasipr@uoc.edu; davidm@maia.ub.es). J. Vitrià is with the Computer Vision Center, Computer Science Depart- ment, Universitat Autònoma de Barcelona, Barcelona 08193, Spain (e-mail: jordi@cvc.uab.es). Digital Object Identifier 10.1109/TNN.2007.911742 Fig. 1. Example of classification accuracy as a function of the number of training samples using the NN classifier and some of the state-of-the-art feature extraction algorithms. The ARFace database has been used for this experiment. curse of dimensionality problem, improving the parameter esti- mation of the classifier [6] and 2) reduce the storage and com- putational needs. These advantages are crucial in face recogni- tion applications, where the feature extraction process can miti- gate the effect of the noise present in natural images, and also in order to find invariant characteristics of the individual, making the later classification step more robust to changes in illumina- tion or partial occlusions. Typically, in face recognition problems, the number of images from each class is considerably limited: only one or two faces can be acquired from each person. Under this assumption, most of the traditional feature extraction methods suffer a significant performance drop. Fig. 1 shows an example of the evolution of the classification accuracy when applying different feature extraction algorithms with different number of training samples. This phenomenon is known as the small sample size problem. In addition, in face recognition problems, the dimensionality of the input space is rather large. In this context, many statis- tical classifiers fail in the density estimation task, making the nonparametric classifiers a good alternative. In this paper, we focus on the nearest neighbor classification rule (NN), which is one of the most used in spite of its simplicity. Assuming the NN classifier, the main motivation of this work is to provide a multiclass feature extraction method that performs no specific distribution assumptions on the data, and that can improve the results of the NN classification even when only a few samples per class are available. The first contribution of this paper is to propose a new itera- tive feature extraction method for multiclass classification under 1045-9227/$25.00 © 2008 IEEE