Pattern Recognition 37 (2004) 781–788 www.elsevier.com/locate/patcog Locality pursuit embedding Wanli Min a ; ∗ ,KeLu b , Xiaofei He c a Department of Statistics, The University of Chicago, 5734 S University Ave., Chicago, IL 60637, USA b School of Computer Science and Engineering, University of Electronic Science & Technology of China Chengdu, Sichuan 610054, China c Computer Science Department, The University of Chicago, 1100 E 58 Street, Chicago, IL 60637, USA Received 26 March 2003; accepted 29 September 2003 Abstract Dimensionality reduction techniques are widespread in pattern recognition research. Principal component analysis, as one of the most popular methods used, is optimal when the data points reside on a linear subspace. Nevertheless, it may fail to preserve the local structure if the data reside on some nonlinear manifold, which is indisputably important in many real applications, especially when nearest-neighbor search is involved. In this paper, we propose locality pursuit embedding,a linear algorithm that arises by solving a variational problem. It produces a linear embedding that respects the local geometrical structure described by the Euclidean distances. Some illustrative examples are presented along with applications to real data sets. ? 2003 Published by Elsevier Ltd on behalf of Pattern Recognition Society. Keywords: Locality preserving; Manifold learning; Principal component analysis; Tangent space; Dimension reduction 1. Introduction Real data of natural and social sciences is often very high dimensional. However, the underlying structure can in many cases be characterized by a small number of parameters. Reducing the dimensionality of such data is benecial for visualizing the intrinsic structure and it is also an important preprocessing step in many statistical pattern recognition problems. Recently, there has been extensive interest in developing low-dimensional representations when the data arise from sampling a probability distribution on a manifold [1–5]. Classical techniques for manifold learning, such as PCA [6], MDS [7], are designed to operate when the submanifold is embedded linearly or almost linearly in the observation space. PCA nds a d-dimensional subspace of R n which captures as much of the variation in the data set as possi- ble. Specically, given data X = {x1; x2;:::; xm} with zero ∗ Corresponding author. Tel.: +1-773-702-0959; fax: +1-773-702-8330. E-mail address: wmin@galton.uchicago.edu (W. Min). mean, it nds yi = w t xi maximizing m  i=1 ‖yi -  y‖ 2 ; where w is the transformation vector, and  y = ∑ i yi =m is the mean. Thus PCA builds a global linear model of the data (a d-dimensional hyperplane). For linearly embedded manifolds, PCA is guaranteed to discover the dimension- ality of the manifold and produce a compact representa- tion in the form of an orthonormal basis. However, for the data on a nonlinear submanifold embedded in the feature space, PCA has two problems. First, PCA has diculty in discovering the underlying structure. For example, the co- variance matrix of data sampled from a helix in R 3 has full-rank and thus three principal components. The helix is actually a one-dimensional manifold and can be parameter- ized with a single parameter. Second, embedding given by PCA preserves only the global structure while local structure is emphasized in many real applications, especially when nearest-neighbor search is involved. Classical MDS nds an embedding that preserves pair- wise distances between data points. It is equivalent to PCA 0031-3203/$30.00 ? 2003 Published by Elsevier Ltd on behalf of Pattern Recognition Society. doi:10.1016/j.patcog.2003.09.005