IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 10, OCTOBER 2013 1575
Automated Induction of Heterogeneous Proximity
Measures for Supervised Spectral Embedding
Eduardo Rodriguez-Martinez, Tingting Mu, Member, IEEE, Jianmin Jiang,
and John Yannis Goulermas, Senior Member, IEEE
Abstract— Spectral embedding methods have played a very
important role in dimensionality reduction and feature generation
in machine learning. Supervised spectral embedding methods
additionally improve the classification of labeled data, using
proximity information that considers both features and class
labels. However, these calculate the proximity information by
treating all intraclass similarities homogeneously for all classes,
and similarly for all interclass samples. In this paper, we propose
a very novel and generic method which can treat all the intra- and
interclass sample similarities heterogeneously by potentially using
a different proximity function for each class and each class pair.
To handle the complexity of selecting these functions, we employ
evolutionary programming as an automated powerful formula
induction engine. In addition, for computational efficiency and
expressive power, we use a compact matrix tree representation
equipped with a broad set of functions that can build most
currently used similarity functions as well as new ones. Model
selection is data driven, because the entire model is symbolically
instantiated using only problem training data, and no user-
selected functions or parameters are required. We perform thor-
ough comparative experimentations with multiple classification
datasets and many existing state-of-the-art embedding methods,
which show that the proposed algorithm is very competitive in
terms of classification accuracy and generalization ability.
Index Terms— Distance metric learning, evolutionary
optimization, heterogeneous proximity information, spectral
dimensionality reduction.
I. I NTRODUCTION
C
LASSIC feature extraction frequently relies on linear
techniques, such as principal component analysis [1]
and Fisher discriminant analysis (FDA) [2], to provide low-
dimensional projections at low computational costs. However,
recent evidence suggests that the use of nonlinear embeddings
of data originally lying in low-dimensional manifolds can
be more effective [3]. A number of unsupervised spectral
embedding methods have been proposed [4]–[6], along with
their linear out-of-sample extensions [7]–[10]. These preserve
Manuscript received July 10, 2012; revised November 10, 2012 and Feb-
ruary 24, 2013; accepted April 30, 2013. Date of publication June 12, 2013;
date of current version September 27, 2013. This work was supported by
CONACyT under Scholarship 19629.
E. Rodriguez-Martinez, T. Mu, and J. Y. Goulermas are with the Department
of Electrical Engineering and Electronics, The University of Liverpool, Liver-
pool L69 3GJ, U.K. (e-mail: edrom@liverpool.ac.uk; t.mu@liverpool.ac.uk;
j.y.goulermas@liverpool.ac.uk).
J. Jiang is with the School of Computer Science and Technology,
Tianjin University, Tianjin 300072, China, and also with the Department
of Computing, University of Surrey, Guildford GU2 7XH, U.K. (e-mail:
jmjiang@tju.edu.cn).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2013.2261613
certain characteristics of the original high-dimensional space.
For instance, the locality preserving projections (LPP)
method [8] and the orthogonal LPP (OLPP) [7] retain the
aggregate pairwise proximity information based on local
neighborhood graphs, while the orthogonal neighborhood pre-
serving projections (ONPP) [7] keep the reconstruction error
from neighboring samples. Nevertheless, in an unsupervised
setup, neighboring points near the class boundaries may
get projected to the wrong class, and this often increases
misclassification rates. As such, several supervised spectral
embedding (SSE) alternatives are proposed to alleviate this
problem. From these, the methods closely related to FDA [11]–
[16] use the between- and within-class information to restrict
the embeddings, whereas another class modifies the proximity
definition to incorporate the label information through various
other formulations [17]–[24].
Despite the existence of a wide range of current SSE
methods, according to no free lunch analyses [25] there is
no single method that can be optimal for all classification
problems. In practice, different alternatives exist for choosing
an SSE method for a given classification task. One possibility
is the selection of the best performing SSE method among
currently existing ones. Such a selection process could be
modeled as a computationally expensive grid search over
all existing models, involving parameter training for each
individual model. Although better search techniques exist to
ease the computational burden of direct search [26], [27],
the selected model may be suboptimal because of likely
assumptions it makes regarding the characteristics of the
problem and data. Another alternative is to specifically design
a suitable SSE method for the problem at hand. This task
may need the involvement of human experts to analyze the
data, characterize the problem and eventually propose a new
mathematical model.
In this paper, we provide a radically different human-
competitive alternative to the design of bespoke SSE methods,
which can also be envisaged as a systemic distance metric
learning approach. We pose the design process as a complex
inference task, in which the optimal model is learnt solely from
the dataset at hand via the means of a genetic programming
(GP)-based model search. This GP search relies on a very
novel encoding scheme expressing each potential model as a
set of similarity functions, which are then used to construct
the characteristic weighting matrix for the classic embedding
optimization problem. To maintain a very broad solution space
of possible models that our algorithm can create, we employ
2162-237X © 2013 IEEE