IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 12, PAGE 1945-1959, DECEMBER 2005 1 Generalized Principal Component Analysis (GPCA) Ren´ e Vidal, Member, IEEE, Yi Ma, Member, IEEE, Shankar Sastry, Fellow, IEEE Abstract— This paper presents an algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a data point give normal vectors to the subspace passing through the point. When the number of subspaces is known, we show that these polynomials can be estimated linearly from data, hence subspace segmentation is reduced to classifying one point per subspace. We select these points optimally from the data set by minimizing certain distance function, thus dealing automatically with moderate noise in the data. A basis for the complement of each subspace is then recovered by applying standard PCA to the collection of derivatives (normal vectors). Extensions of GPCA that deal with data in a high-dimensional space and with an unknown number of subspaces are also presented. Our experiments on low-dimensional data show that GPCA out-performs existing algebraic algorithms based on polynomial factorization and provides a good initialization to iterative tech- niques such as K-subspace and Expectation Maximization. We also present applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3- D motion segmentation from point correspondences in multiple affine views. Index Terms— Principal component analysis (PCA), subspace segmentation, Veronese map, dimensionality reduction, temporal video segmentation, dynamic scenes and motion segmentation. I. I NTRODUCTION P RINCIPAL Component Analysis (PCA) [12] refers to the problem of fitting a linear subspace S ⊂ R D of unknown dimension d<D to N sample points {x j } N j=1 in S. This problem shows up in a variety of applications in many fields, e.g., pattern recognition, data compression, regression, image processing, etc., and can be solved in a remarkably simple way from the singular value decomposition (SVD) of the (mean- subtracted) data matrix [x 1 , x 2 ,..., x N ] ∈ R D×N . 1 With noisy data, this linear algebraic solution has the geometric interpretation of minimizing the sum of the squared distances from the (noisy) data points x j to their projections ˜ x j in S. Ren´ e Vidal: 308B Clark Hall, 3400 N Charles Street, Baltimore, MD 21218. Tel: (410) 516-7306, Fax: (410) 516-4594, E-mail: rvidal@cis.jhu.edu. Yi Ma: 145 Coordinated Science Laboratory, 1308 West Main Street, Urbana, IL 61801. Tel: (217) 244-0871, Fax: (217) 244-2352, E-mail: yima@uiuc.edu. Shankar Sastry: 514 Cory Hall, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720. Tel: (510) 642-1857, Fax: (510) 643-2356, E-mail: sastry@eecs.berkeley.edu. The authors thank Drs. Jacopo Piazzi and Kun Huang for their contributions to this work, and Drs. Frederik Schaffalitzky and Robert Fossum for insightful discussions on the topic. This work was partially supported by Hopkins WSE startup funds, UIUC ECE startup funds, and by grants NSF CAREER IIS-0347456, NSF CAREER IIS-0447739, NSF CRS-EHS-0509151, ONR YIP N00014-05-1-0633, ONR N00014-00-1-0621 and DARPA F33615-98- C-3614. 1 In the context of stochastic signal processing, PCA is also known as the Karhunen-Loeve transform [18]; in the applied statistics literature, SVD is also known as the Eckart and Young decomposition [4]. S 1 S 2 y 1 y 2 x b11 b12 b2 o R 3 Fig. 1. Data points drawn from the union of one plane and one line (through the origin o) in R 3 . The objective of subspace segmentation is to identify the normal vectors b 11 , b 12 and b 2 to each one of the subspaces from the data. In addition to these algebraic and geometric interpretations, PCA can also be understood in a probabilistic manner. In Probabilistic PCA [20] (PPCA), the noise is assumed to be drawn from an unknown distribution, and the problem becomes one of identifying the subspace and distribution parameters in a maximum likelihood sense. When the noise distribution is Gaussian, the algebro-geometric and proba- bilistic interpretations coincide [2]. However, when the noise distribution is non-Gaussian the solution to PPCA is no longer linear, as shown in [2], where PCA is generalized to arbitrary distributions in the exponential family. Another extension of PCA is nonlinear principal com- ponents (NLPCA) or Kernel PCA (KPCA), which is the problem of identifying a nonlinear manifold from sample points. The standard solution to NLPCA [16] is based on first embedding the data into a higher-dimensional feature space F and then applying standard PCA to the embedded data. Since the dimension of F can be large, a more practical solution is obtained from the eigenvalue decomposition of the so-called kernel matrix, hence the name KPCA. One of the disadvantages of KPCA is that in practice, it is difficult to determine which kernel function to use, because the choice of the kernel naturally depends on the nonlinear structure of the manifold to be identified. In fact, learning kernels is an active topic of research in machine learning. To the best of our knowledge, our work is the first one to prove analytically that the Veronese map (a polynomial embedding) is the natural embedding for data lying in a union of multiple subspaces. In this paper, we consider the following alternative extension of PCA to the case of data lying in a union of subspaces, as illustrated in Figure 1 for two subspaces of R 3 .