FAST PRINCIPAL COMPONENT ANALYSIS USING EIGENSPACE MERGING Liang Liu 1 , Yunhong Wang 2 , Qian Wang 1 , Tieniu Tan 1 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing, China 2 School of Computer Science and Engineering Beihang University, Beijing, China ABSTRACT In this paper, we propose a fast algorithm for Principal Com- ponent Analysis (PCA) dealing with large high-dimensional data sets. A large data set is firstly divided into several small data sets. Then, the traditional PCA method is applied on each small data set and several eigenspace models are obtained, where each eigenspace model is computed from a small data set. At last, these eigenspace models are merged into one eigenspace model which contains the PCA result of the orig- inal data set. Experiments on the FERET data set show that this algorithm is much faster than the traditional PCA method, while the principal components and the reconstruction errors are almost the same as that given by the traditional method. Index Termsprincipal component analysis, eigenspace merging. 1. INTRODUCTION PCA (Principal Component Analysis) is widely used in di- mension reduction, feature extraction, image compression, etc. In 1990s, PCA was used in face recognition and made pro- found influence in this field. The problem of PCA can be formulated as follows. For an m × n matrix D, each column can be viewed as a point in m-dimensional linear space. The task of PCA is to find the center of the n points and c principal orthonormal vectors which expand an eigenspace 1 . In many applications, c is much smaller than both m and n. For the traditional PCA method [1], the time complex- ity 2 is O(mn · min(m, n)/2) and the space complexity is O(mn). When m and n are very large, both the time com- plexity and the space complexity can be prohibitive in prac- tice. In this paper, a fast algorithm with a possible loss of pre- cision is proposed. For this algorithm, the time complexity is O(( 6 + 1)cmn) and the space complexity is O( 6cm). These make the task much easier. The proposed method can be viewed as an application of eigenspace merging [2, 3, 4]. The algorithm of eigenspace 1 An eigenspace is an affine subspace of the original m-dimensional space. 2 For simplicity, assume that m n or m n. merging was originally used to merge two eigenspaces with- out storing covariance matrix or original data. In this paper, we shall show that eigenspace merging can be used to design a fast algorithm for PCA. We will also analyze the error bound introduced by the proposed algorithm. The remainder of this paper is organized as follows. In Section 2, a fast algorithm for PCA is proposed and discussed in detail. In Section 3, some experimental results are pre- sented. Conclusions are drawn in Section 4. 2. FAST PCA USING EIGENSPACE MERGING In Section 2.1, we give a description of eigenspace model. Section 2.2 gives a brief introduction about eigenspace merg- ing. In Section 2.3, we will present the algorithm of Fast PCA in detail. In Section 2.4, we analyze the error bound of the proposed algorithm. 2.1. Eigenspace model description An eigenspace model is a structure which contains four pa- rameters, namely Ω=( x, U, Λ,N ) [5], where x is the cen- ter of the eigenspace, and U is a matrix whose columns are orthonormal bases of the eigenspace, namely eigenvectors. Λ is a diagonal matrix whose elements along the diagonal are variances for each principal axis, namely eigenvalues, and N is the number of samples to construct this eigenspace. In Section 2.2, we shall see that this model is quite conve- nient for eigenspace merging. 2.2. A brief introduction about eigenspace merging Skarbek [2] developed an algorithm to compute eigenspace merging which is more concise than Hall’s method [3]. Both methods need not store the covariance matrix of previous train- ing samples. Given two eigenspace models Ω 1 and Ω 2 , eigen- space merging is used to find the eigenspace model Ω for the union of the original data sets assuming that the original data is not available. If there are q 1 and q 2 eigenvectors in Ω 1 and Ω 2 respec- tively and we keep c eigenvectors in Ω, the time complexity VI - 457 1-4244-1437-7/07/$20.00 ©2007 IEEE ICIP 2007