SEPARABLE PCA FOR IMAGE CLASSIFICATION
Yongxin Taylor Xi and Peter J. Ramadge
Dept. Electrical Engineering, Princeton University, Princeton NJ
ABSTRACT
As an alternative to standard PCA, matrix-based image di-
mensionality reduction methods have recently been proposed
and have gained attention due to reported computational ef-
ficiency and robust performance in classification. We unify
all of these methods through one concept: Separable Princi-
ple Component Analysis (SPCA). We show that the proposed
matrix methods are either equivalent to, special cases of, or
approximations to SPCA. We include performance compar-
isons of the methods on two face data sets and a handwritten
digit data set. The empirical results indicate that two exist-
ing methods, BD-PCA and its variant NGLRAM, are very
good, efficiently computable, approximate solutions to prac-
tical SPCA problems.
Index Terms— Image classification, eigenvalues and
eigenfunctions, discrete transforms, image representations,
face recognition.
1. INTRODUCTION
Principal component analysis (PCA) is an important feature
selection method used in many image detection/classification
schemes. One prominent example is its successful applica-
tion in face detection and classification, e.g. [1, 2]. How-
ever, estimation of the PCA projection from data has some
limitations. First, its computational complexity makes it diffi-
cult to deal directly with high dimensional data, e.g. images.
Second, the number of examples available for the estimation
of the PCA projection is typically much smaller that the am-
bient dimension of the data and this can lead to over fitting
of the projection. In an effort to alleviate these problems in
image classification applications, several variations on stan-
dard PCA have recently been proposed [3, 4, 5, 6, 7]. These
schemes are reported to have reduced computational burden
and, when coupled with appropriate classifiers, to yield im-
proved and robust classification rates [3, 4, 8, 5]. We seek
to better understand the relationship of these algorithms with
standard methods.
Our main contribution is the unification of these methods
through Separable PCA (SPCA). SPCA seeks a separable ba-
sis of images that maximizes the variance of the coordinates
over the ensemble of data images. We show that each of the
above schemes is either equivalent to, a special case of, or an
approximation to SPCA. Specifically, 2DPCA [3] is an easily
solvable special case of SPCA. BD-PCA [4] and NGLRAM
[7] project the image data onto a separable basis. We give pre-
cise conditions under which BD-PCA is a solution of SPCA
and when these conditions are not satisfied, show that BD-
PCA and NGLRAM give very good approximate solutions to
SPCA. Finally, GLRAM [5], a method for obtaining low rank
approximations, is equivalent to SPCA. Thus SPCA unifies a
variety of prior proposals in the literature.
2. BACKGROUND
Let X denote a linear space and Y denote a finite set of labels.
Given a set {(x
k
,y
k
) ∈X× Y,k =1,...,N } of training
examples (x
i
are instances, y
i
are labels), we want to design a
classifier h : X→ Y that ‘best’ predicts the label of a new test
instance x ∈X . For example, each training instance might
be an m × n grey scale face image with the associated label
being the identifier of the corresponding individual.
The PCA approach to this problem uses the training data
{x
k
}
N
k=1
to determine a linear projection Q : X→ R
d
into a
lower dimensional space. Then the label information is used
to design a classifier h : R
d
→ Y . For example, this might be
a nearest neighbor classifier in the projected space.
It will be helpful to review PCA when X = R
s
, some in-
teger s> 0. Without loss of generality, assume that the data
is centered, i.e.,
∑
N
k=1
x
k
=0. To select the PCA projec-
tion, form the data matrix D =[x
1
,x
2
,...,x
N
]. The scatter
matrix (empirical covariance) is then DD
T
=
∑
N
k=1
x
k
x
T
k
∈
R
s×s
. DD
T
has at most N − 1 nonzero eigenvalues. Let w
j
,
j =1,...,d, denote the first d eigenvectors ordered by eigen-
value, largest to smallest. The PCA projection into R
d
results
by setting P =[w
1
,w
2
,...,w
d
] and ˆ x
k
= P
T
x
k
. In prac-
tice, one computes P from an SVD D = U ΣV
T
, yielding
DD
T
= U Σ
2
U
T
and P =[u
1
,...,u
d
] where the u
j
are the
first d left singular vectors of D. For N s, the complexity
of computing P is O(sN
2
) in time and O(sN ) in space.
When each data point is an m × n grey scale image A
k
,
PCA finds an ON set {W
j
}
d
j=1
of d principal eigenimages of
the empirical covariance function of the image data [9]. Im-
age A
k
is then projected to its coordinates with respect to this
ON basis, i.e., ˆ a
kj
= A
k
,W
j
, j =1,...,d, where ·, · is
the standard inner product. It is convenient to compute these
eigenimages by exploiting an isometry between R
m×n
and
1805 978-1-4244-2354-5/09/$25.00 ©2009 IEEE ICASSP 2009