FAST PRINCIPAL COMPONENT ANALYSIS USING EIGENSPACE MERGING
Liang Liu
1
, Yunhong Wang
2
, Qian Wang
1
, Tieniu Tan
1
1
National Laboratory of Pattern Recognition
Institute of Automation, Chinese Academy of Sciences, Beijing, China
2
School of Computer Science and Engineering
Beihang University, Beijing, China
ABSTRACT
In this paper, we propose a fast algorithm for Principal Com-
ponent Analysis (PCA) dealing with large high-dimensional
data sets. A large data set is firstly divided into several small
data sets. Then, the traditional PCA method is applied on each
small data set and several eigenspace models are obtained,
where each eigenspace model is computed from a small data
set. At last, these eigenspace models are merged into one
eigenspace model which contains the PCA result of the orig-
inal data set. Experiments on the FERET data set show that
this algorithm is much faster than the traditional PCA method,
while the principal components and the reconstruction errors
are almost the same as that given by the traditional method.
Index Terms— principal component analysis, eigenspace
merging.
1. INTRODUCTION
PCA (Principal Component Analysis) is widely used in di-
mension reduction, feature extraction, image compression, etc.
In 1990s, PCA was used in face recognition and made pro-
found influence in this field. The problem of PCA can be
formulated as follows.
For an m × n matrix D, each column can be viewed as
a point in m-dimensional linear space. The task of PCA is
to find the center of the n points and c principal orthonormal
vectors which expand an eigenspace
1
. In many applications,
c is much smaller than both m and n.
For the traditional PCA method [1], the time complex-
ity
2
is O(mn · min(m, n)/2) and the space complexity is
O(mn). When m and n are very large, both the time com-
plexity and the space complexity can be prohibitive in prac-
tice. In this paper, a fast algorithm with a possible loss of pre-
cision is proposed. For this algorithm, the time complexity
is O((
√
6 + 1)cmn) and the space complexity is O(
√
6cm).
These make the task much easier.
The proposed method can be viewed as an application of
eigenspace merging [2, 3, 4]. The algorithm of eigenspace
1
An eigenspace is an affine subspace of the original m-dimensional space.
2
For simplicity, assume that m ≫ n or m ≪ n.
merging was originally used to merge two eigenspaces with-
out storing covariance matrix or original data. In this paper,
we shall show that eigenspace merging can be used to design
a fast algorithm for PCA. We will also analyze the error bound
introduced by the proposed algorithm.
The remainder of this paper is organized as follows. In
Section 2, a fast algorithm for PCA is proposed and discussed
in detail. In Section 3, some experimental results are pre-
sented. Conclusions are drawn in Section 4.
2. FAST PCA USING EIGENSPACE MERGING
In Section 2.1, we give a description of eigenspace model.
Section 2.2 gives a brief introduction about eigenspace merg-
ing. In Section 2.3, we will present the algorithm of Fast PCA
in detail. In Section 2.4, we analyze the error bound of the
proposed algorithm.
2.1. Eigenspace model description
An eigenspace model is a structure which contains four pa-
rameters, namely Ω=( x, U, Λ,N ) [5], where x is the cen-
ter of the eigenspace, and U is a matrix whose columns are
orthonormal bases of the eigenspace, namely eigenvectors. Λ
is a diagonal matrix whose elements along the diagonal are
variances for each principal axis, namely eigenvalues, and N
is the number of samples to construct this eigenspace.
In Section 2.2, we shall see that this model is quite conve-
nient for eigenspace merging.
2.2. A brief introduction about eigenspace merging
Skarbek [2] developed an algorithm to compute eigenspace
merging which is more concise than Hall’s method [3]. Both
methods need not store the covariance matrix of previous train-
ing samples. Given two eigenspace models Ω
1
and Ω
2
, eigen-
space merging is used to find the eigenspace model Ω for the
union of the original data sets assuming that the original data
is not available.
If there are q
1
and q
2
eigenvectors in Ω
1
and Ω
2
respec-
tively and we keep c eigenvectors in Ω, the time complexity
VI - 457 1-4244-1437-7/07/$20.00 ©2007 IEEE ICIP 2007