IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010 1359 Robust Classifiers for Data Reduced via Random Projections Angshul Majumdar and Rabab K. Ward Abstract—The computational cost for most classification algo- rithms is dependent on the dimensionality of the input samples. As the dimensionality could be high in many cases, particularly those associated with image classification, reducing the dimensionality of the data becomes a necessity. The traditional dimensionality re- duction methods are data dependent, which poses certain practical problems. Random projection (RP) is an alternative dimensional- ity reduction method that is data independent and bypasses these problems. The nearest neighbor classifier has been used with the RP method in classification problems. To obtain higher recognition accuracy, this study looks at the robustness of RP dimensionality reduction for several recently proposed classifiers—sparse classi- fier (SC), group SC (along with their fast versions), and the nearest subspace classifier. Theoretical proofs are offered regarding the robustness of these classifiers to RP. The theoretical results are confirmed by experimental evaluations. Index Terms—Classification, face recognition, random projec- tion (RP). I. I NTRODUCTION T HE TERM “compressive classification” (CC) was first coined in [1]. It originated with a new paradigm in sig- nal processing called “compressive sampling” or “compressed sensing” (CS) [2], [3]. CS combines dimensionality reduction with data acquisition by collecting a (random) lower dimen- sional projection of the original data instead of sampling it. CC refers to a new class of classification methods that are robust to data acquired using CS. Only a few properties are preserved by CS data acquisition, and compressive classifiers are designed to exploit these properties so that the recognition accuracy on data acquired by CS is approximately the same as that on data acquired by traditional sampling. In this work, we discuss a group of such classifiers which are robust to data acquired thus and fall under the category of CC. There is a basic difference that separates CC from conven- tional classification methods. In conventional classification, the data are acquired by traditional (Nyquist) sampling. Once all the data are obtained, a data-dependent dimensionality reduc- tion technique is employed; data acquisition and dimensionality reduction are disjoint activities. CC operates on data acquired by a CS technique, where dimensionality reduction occurs simultaneously with data acquisition. Thus, CC works with a dimensionality reduction method that is data independent, Manuscript received May 26, 2009; revised November 30, 2009; accepted December 7, 2009. Date of publication January 26, 2010; date of current version September 15, 2010. This paper was recommended by Associate Editor F. Karray. The authors are with the Department of Electrical and Computer Engineer- ing, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada (e-mail: angshulm@ece.ubc.ca; rababw@ece.ubc.ca). Digital Object Identifier 10.1109/TSMCB.2009.2038493 whereas the dimensionality reduction techniques in traditional classification are data dependent (e.g., principal component analysis, linear discriminant analysis, etc.). For some practical situations, data-dependent dimensionality reduction methods are not efficient. Consider a practical scenario of face authentication in a bank or an office. In a bank, new clients are added daily to the database, and in some offices, employees are also added on a regular basis. Suppose that, at a certain given time, face images of 200 people are available, and following conventional face recognition methods (e.g., eigenface and Fisherface), a data-dependent dimensionality reduction is employed, resulting in a high- to low-dimensional projection matrix. When images of ten more people are added (e.g., the next day), the projection matrix from the high-to-low dimension must be recalculated for all 210 people. Unfortunately, there is no way for the old projection matrix to be updated by the new data (reducing the complexity of such updates is an active area of research [15], [16]). For such cases, a data-independent dimensionality reduction method is desirable. Such a scenario can easily be handled by CC. CC uses a random projection (RP) matrix for dimensionality reduction. The projection matrix is data independent (it can be a Gaussian- or a Bernoulli-type random matrix or a partial Fourier matrix). Compressive classifiers are data independent in the sense that they do not require retraining [like support vector machines (SVMs) or artificial neural networks (ANNs)] whenever new data are added. Dimensionality reduction by RP (i.e., CS data acquisition) [41] gives good results only if the classifier is based on a distance-based measure (e.g., Euclidean or cosine). Conse- quently, the nearest neighbor (NN) classifier is robust to such randomly projected data and can be used as a compressive classifier. Other studies have shown empirically that RP can also be used in conjunction with certain ANNs [4] and SVMs [5]. However, both ANNs and SVMs have a data-dependent training phase, i.e., they need to be retrained whenever new data are added. As a result, ANNs or SVMs are not computationally efficient solutions to the aforementioned problem. Hence, we will not consider these classifiers in this work. The idea behind CC is to provide data-independent solutions for dimensionality reduction and classification problems. Traditionally, it is assumed that the training phase is off- line, so constraint on the time/computation during training is weak. In this case, the existing sophisticated methods related to dimensionality reduction [6]–[12] and classification [13], [14] can be employed. It should be mentioned that some effort in online training is discernible in current face recognition research [17]. Traditionally, it was assumed that the training phase is offline, and the training samples are fixed, i.e., do not change with time. However, practical scenarios dictate updating 1083-4419/$26.00 © 2010 IEEE