IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 1, MARCH 2007 69
A Parameter-Free Framework for General
Supervised Subspace Learning
Shuicheng Yan, Member, IEEE, Jianzhuang Liu, Senior Member, IEEE, Xiaoou Tang, Senior Member, IEEE,
and Thomas S. Huang, Life Fellow, IEEE
Abstract—Supervised subspace learning techniques have been
extensively studied in biometrics literature; however, there is little
work dedicated to: 1) how to automatically determine the subspace
dimension in the context of supervised learning, and 2) how to
explicitly guarantee the classification performance on a training
set. In this paper, by following our previous work on unified sub-
space learning framework in our earlier work, we present a gen-
eral framework, called parameter-free graph embedding (PFGE)
to solve the above two problems by posing a general supervised
subspace learning task as a semidefinite programming problem.
The semipositive feature Gram matrix, namely the product of the
transformation matrix and its transpose, is derived by optimizing
a trace difference form of an objective function extended from that
in our earlier work with the constraints that guarantee the class
homogeneity within the neighborhood of each datum. Then, the
subspace dimension and the feature weights are simultaneously ob-
tained via the singular value decomposition of the feature Gram
matrix. In addition, to alleviate the computational complexity, the
Kronecker product approximation of the feature Gram matrix is
proposed by taking advantage of the essential matrix form of image
pixels. The experiments on simulated data and real-world data
demonstrate the capability of the new PFGE framework in esti-
mating the subspace dimension for supervised learning as well as
the superiority in classification performance over traditional algo-
rithms for subspace learning.
Index Terms—Semidefinite programming, subspace dimension
determination, subspace learning.
I. INTRODUCTION
T
ECHNIQUES for subspace learning [10], [17], [31], [23],
[28] have been actively studied for decades. Most of them,
such as principal component analysis (PCA) [12], [22], linear
discriminant analysis (LDA) [2], [9], [31], and marginal fisher
analysis (MFA) [29], are solved with the spectral-analysis [6],
[9] methods. The supervised techniques often optimize objec-
tive functions characterizing the discriminative power in the
sense of expectation or with certain assumptions on data distri-
bution, and cannot ensure that the training samples are best clas-
sified with the nearest neighbor method in an obtained low-di-
Manuscript received May 22, 2006; revised September 19, 2006. This work
was supported in part by DTO under Contract NBCHC060160 and in part by the
Research Grants Council of the Hong Kong SAR under Project CUHK 414306.
The associate editor coordinating the review of this manuscript and approving
it for publication was Prof. Vijaya Kumar Bhagavatula
S. Yan and T. S. Huang are with the Beckman Institute, University of Illinois
at Urbana-Champaign, Urbana, IL 61801 USA (e-mail: scyan@ifp.uiuc.edu;
huang@ifp.uiuc.edu).
J. Liu and X. Tang are with the Department of Information Engineering,
the Chinese University of Hong Kong, Shatin, NT, Hong Kong, China (e-mail:
jzliu@ie.cuhk.edu.hk; xtang@ie.cuhk.edu.hk).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIFS.2006.890313
mensional feature space, especially when the number of training
samples is small.
How to automatically determine the dimension of the desired
low-dimensional feature space is seldom discussed in previous
algorithms for supervised dimensionality reduction. Hence,
the dimension is often intuitively set, or all possible subspace
dimensions are explored in order to obtain the optimal one
for classification, which is impractical and easily overfits the
specific testing data. In the literature of unsupervised learning,
intrinsic data dimension estimation [13], [16], [11] has been
widely discussed in past decades. Kegl [13] utilized the geo-
metric properties of the data to estimate the intrinsic data
dimension in a nonparametric way. Hu [11] studied the auto-
matic subspace dimension determination under the framework
of Bayesian Ying–Yang (BYY) harmony learning. Lin et al.
[16] estimated the intrinsic data dimension by constructing a
Riemannian manifold in the form of a simplicial complex, and
the dimension is defined as the maximal dimension of its sim-
plices. Brito et al. [4] treated as a random variable the average
reach of vertices in a -nearest-neighbors graph associated with
the interpoint distance matrix, and showed that this variable
can be used to accurately (from a probabilistic viewpoint)
identify the unknown dimension at low computational cost.
Brito [5] discussed the application of linear combinations of
the degree frequencies in the minimal spanning tree to the
problem of identifying the appropriate dimension for a data set
from its interpoint distance matrix. Costa [7] and Yang [30]
studied the data dimension estimation problem by using trees
to approximate manifold structures. All of these methods focus
on unsupervised learning, and do not utilize the information
of data-class labels that are available in supervised subspace
learning.
Motivated by the above observations, we present a param-
eter-free framework for general supervised subspace learning
by following our previous work on graph embedding as a
unified framework for subspace learning [29]. The new frame-
work searches for a low-dimensional feature space where the
neighboring points of each datum share the same class label,
which is optimal in the sense of the nearest neighbor clas-
sification.
The whole framework, referred to as parameter-free graph
embedding (PFGE), consists of the following steps. First,
instead of directly computing the transformation matrix for
dimensionality reduction, we search for the feature Gram
matrix (i.e., the product of the transformation matrix and its
transpose). Then, the ratio form of the objective function in
the graph embedding framework [29] is transformed into a
difference form in PFGE. After that, the feature Gram ma-
trix is learned by posing the supervised subspace learning
1556-6013/$25.00 © 2007 IEEE
Authorized licensed use limited to: National University of Singapore. Downloaded on August 8, 2009 at 03:55 from IEEE Xplore. Restrictions apply.