IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 1, MARCH 2007 69 A Parameter-Free Framework for General Supervised Subspace Learning Shuicheng Yan, Member, IEEE, Jianzhuang Liu, Senior Member, IEEE, Xiaoou Tang, Senior Member, IEEE, and Thomas S. Huang, Life Fellow, IEEE Abstract—Supervised subspace learning techniques have been extensively studied in biometrics literature; however, there is little work dedicated to: 1) how to automatically determine the subspace dimension in the context of supervised learning, and 2) how to explicitly guarantee the classification performance on a training set. In this paper, by following our previous work on unified sub- space learning framework in our earlier work, we present a gen- eral framework, called parameter-free graph embedding (PFGE) to solve the above two problems by posing a general supervised subspace learning task as a semidefinite programming problem. The semipositive feature Gram matrix, namely the product of the transformation matrix and its transpose, is derived by optimizing a trace difference form of an objective function extended from that in our earlier work with the constraints that guarantee the class homogeneity within the neighborhood of each datum. Then, the subspace dimension and the feature weights are simultaneously ob- tained via the singular value decomposition of the feature Gram matrix. In addition, to alleviate the computational complexity, the Kronecker product approximation of the feature Gram matrix is proposed by taking advantage of the essential matrix form of image pixels. The experiments on simulated data and real-world data demonstrate the capability of the new PFGE framework in esti- mating the subspace dimension for supervised learning as well as the superiority in classification performance over traditional algo- rithms for subspace learning. Index Terms—Semidefinite programming, subspace dimension determination, subspace learning. I. INTRODUCTION T ECHNIQUES for subspace learning [10], [17], [31], [23], [28] have been actively studied for decades. Most of them, such as principal component analysis (PCA) [12], [22], linear discriminant analysis (LDA) [2], [9], [31], and marginal fisher analysis (MFA) [29], are solved with the spectral-analysis [6], [9] methods. The supervised techniques often optimize objec- tive functions characterizing the discriminative power in the sense of expectation or with certain assumptions on data distri- bution, and cannot ensure that the training samples are best clas- sified with the nearest neighbor method in an obtained low-di- Manuscript received May 22, 2006; revised September 19, 2006. This work was supported in part by DTO under Contract NBCHC060160 and in part by the Research Grants Council of the Hong Kong SAR under Project CUHK 414306. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Vijaya Kumar Bhagavatula S. Yan and T. S. Huang are with the Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (e-mail: scyan@ifp.uiuc.edu; huang@ifp.uiuc.edu). J. Liu and X. Tang are with the Department of Information Engineering, the Chinese University of Hong Kong, Shatin, NT, Hong Kong, China (e-mail: jzliu@ie.cuhk.edu.hk; xtang@ie.cuhk.edu.hk). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2006.890313 mensional feature space, especially when the number of training samples is small. How to automatically determine the dimension of the desired low-dimensional feature space is seldom discussed in previous algorithms for supervised dimensionality reduction. Hence, the dimension is often intuitively set, or all possible subspace dimensions are explored in order to obtain the optimal one for classification, which is impractical and easily overfits the specific testing data. In the literature of unsupervised learning, intrinsic data dimension estimation [13], [16], [11] has been widely discussed in past decades. Kegl [13] utilized the geo- metric properties of the data to estimate the intrinsic data dimension in a nonparametric way. Hu [11] studied the auto- matic subspace dimension determination under the framework of Bayesian Ying–Yang (BYY) harmony learning. Lin et al. [16] estimated the intrinsic data dimension by constructing a Riemannian manifold in the form of a simplicial complex, and the dimension is defined as the maximal dimension of its sim- plices. Brito et al. [4] treated as a random variable the average reach of vertices in a -nearest-neighbors graph associated with the interpoint distance matrix, and showed that this variable can be used to accurately (from a probabilistic viewpoint) identify the unknown dimension at low computational cost. Brito [5] discussed the application of linear combinations of the degree frequencies in the minimal spanning tree to the problem of identifying the appropriate dimension for a data set from its interpoint distance matrix. Costa [7] and Yang [30] studied the data dimension estimation problem by using trees to approximate manifold structures. All of these methods focus on unsupervised learning, and do not utilize the information of data-class labels that are available in supervised subspace learning. Motivated by the above observations, we present a param- eter-free framework for general supervised subspace learning by following our previous work on graph embedding as a unified framework for subspace learning [29]. The new frame- work searches for a low-dimensional feature space where the neighboring points of each datum share the same class label, which is optimal in the sense of the nearest neighbor clas- sification. The whole framework, referred to as parameter-free graph embedding (PFGE), consists of the following steps. First, instead of directly computing the transformation matrix for dimensionality reduction, we search for the feature Gram matrix (i.e., the product of the transformation matrix and its transpose). Then, the ratio form of the objective function in the graph embedding framework [29] is transformed into a difference form in PFGE. After that, the feature Gram ma- trix is learned by posing the supervised subspace learning 1556-6013/$25.00 © 2007 IEEE Authorized licensed use limited to: National University of Singapore. Downloaded on August 8, 2009 at 03:55 from IEEE Xplore. Restrictions apply.