IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010 1359
Robust Classifiers for Data Reduced via
Random Projections
Angshul Majumdar and Rabab K. Ward
Abstract—The computational cost for most classification algo-
rithms is dependent on the dimensionality of the input samples. As
the dimensionality could be high in many cases, particularly those
associated with image classification, reducing the dimensionality
of the data becomes a necessity. The traditional dimensionality re-
duction methods are data dependent, which poses certain practical
problems. Random projection (RP) is an alternative dimensional-
ity reduction method that is data independent and bypasses these
problems. The nearest neighbor classifier has been used with the
RP method in classification problems. To obtain higher recognition
accuracy, this study looks at the robustness of RP dimensionality
reduction for several recently proposed classifiers—sparse classi-
fier (SC), group SC (along with their fast versions), and the nearest
subspace classifier. Theoretical proofs are offered regarding the
robustness of these classifiers to RP. The theoretical results are
confirmed by experimental evaluations.
Index Terms—Classification, face recognition, random projec-
tion (RP).
I. I NTRODUCTION
T
HE TERM “compressive classification” (CC) was first
coined in [1]. It originated with a new paradigm in sig-
nal processing called “compressive sampling” or “compressed
sensing” (CS) [2], [3]. CS combines dimensionality reduction
with data acquisition by collecting a (random) lower dimen-
sional projection of the original data instead of sampling it. CC
refers to a new class of classification methods that are robust to
data acquired using CS. Only a few properties are preserved by
CS data acquisition, and compressive classifiers are designed
to exploit these properties so that the recognition accuracy on
data acquired by CS is approximately the same as that on data
acquired by traditional sampling. In this work, we discuss a
group of such classifiers which are robust to data acquired thus
and fall under the category of CC.
There is a basic difference that separates CC from conven-
tional classification methods. In conventional classification, the
data are acquired by traditional (Nyquist) sampling. Once all
the data are obtained, a data-dependent dimensionality reduc-
tion technique is employed; data acquisition and dimensionality
reduction are disjoint activities. CC operates on data acquired
by a CS technique, where dimensionality reduction occurs
simultaneously with data acquisition. Thus, CC works with
a dimensionality reduction method that is data independent,
Manuscript received May 26, 2009; revised November 30, 2009; accepted
December 7, 2009. Date of publication January 26, 2010; date of current
version September 15, 2010. This paper was recommended by Associate
Editor F. Karray.
The authors are with the Department of Electrical and Computer Engineer-
ing, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada
(e-mail: angshulm@ece.ubc.ca; rababw@ece.ubc.ca).
Digital Object Identifier 10.1109/TSMCB.2009.2038493
whereas the dimensionality reduction techniques in traditional
classification are data dependent (e.g., principal component
analysis, linear discriminant analysis, etc.).
For some practical situations, data-dependent dimensionality
reduction methods are not efficient. Consider a practical
scenario of face authentication in a bank or an office. In a bank,
new clients are added daily to the database, and in some offices,
employees are also added on a regular basis. Suppose that, at
a certain given time, face images of 200 people are available,
and following conventional face recognition methods (e.g.,
eigenface and Fisherface), a data-dependent dimensionality
reduction is employed, resulting in a high- to low-dimensional
projection matrix. When images of ten more people are
added (e.g., the next day), the projection matrix from the
high-to-low dimension must be recalculated for all 210 people.
Unfortunately, there is no way for the old projection matrix
to be updated by the new data (reducing the complexity of
such updates is an active area of research [15], [16]). For such
cases, a data-independent dimensionality reduction method
is desirable. Such a scenario can easily be handled by CC.
CC uses a random projection (RP) matrix for dimensionality
reduction. The projection matrix is data independent (it can
be a Gaussian- or a Bernoulli-type random matrix or a partial
Fourier matrix). Compressive classifiers are data independent
in the sense that they do not require retraining [like support
vector machines (SVMs) or artificial neural networks (ANNs)]
whenever new data are added.
Dimensionality reduction by RP (i.e., CS data acquisition)
[41] gives good results only if the classifier is based on a
distance-based measure (e.g., Euclidean or cosine). Conse-
quently, the nearest neighbor (NN) classifier is robust to such
randomly projected data and can be used as a compressive
classifier. Other studies have shown empirically that RP can
also be used in conjunction with certain ANNs [4] and SVMs
[5]. However, both ANNs and SVMs have a data-dependent
training phase, i.e., they need to be retrained whenever new data
are added. As a result, ANNs or SVMs are not computationally
efficient solutions to the aforementioned problem. Hence, we
will not consider these classifiers in this work. The idea behind
CC is to provide data-independent solutions for dimensionality
reduction and classification problems.
Traditionally, it is assumed that the training phase is off-
line, so constraint on the time/computation during training is
weak. In this case, the existing sophisticated methods related
to dimensionality reduction [6]–[12] and classification [13],
[14] can be employed. It should be mentioned that some effort
in online training is discernible in current face recognition
research [17]. Traditionally, it was assumed that the training
phase is offline, and the training samples are fixed, i.e., do not
change with time. However, practical scenarios dictate updating
1083-4419/$26.00 © 2010 IEEE