IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 10, OCTOBER 2013 1575 Automated Induction of Heterogeneous Proximity Measures for Supervised Spectral Embedding Eduardo Rodriguez-Martinez, Tingting Mu, Member, IEEE, Jianmin Jiang, and John Yannis Goulermas, Senior Member, IEEE Abstract— Spectral embedding methods have played a very important role in dimensionality reduction and feature generation in machine learning. Supervised spectral embedding methods additionally improve the classification of labeled data, using proximity information that considers both features and class labels. However, these calculate the proximity information by treating all intraclass similarities homogeneously for all classes, and similarly for all interclass samples. In this paper, we propose a very novel and generic method which can treat all the intra- and interclass sample similarities heterogeneously by potentially using a different proximity function for each class and each class pair. To handle the complexity of selecting these functions, we employ evolutionary programming as an automated powerful formula induction engine. In addition, for computational efficiency and expressive power, we use a compact matrix tree representation equipped with a broad set of functions that can build most currently used similarity functions as well as new ones. Model selection is data driven, because the entire model is symbolically instantiated using only problem training data, and no user- selected functions or parameters are required. We perform thor- ough comparative experimentations with multiple classification datasets and many existing state-of-the-art embedding methods, which show that the proposed algorithm is very competitive in terms of classification accuracy and generalization ability. Index Terms— Distance metric learning, evolutionary optimization, heterogeneous proximity information, spectral dimensionality reduction. I. I NTRODUCTION C LASSIC feature extraction frequently relies on linear techniques, such as principal component analysis [1] and Fisher discriminant analysis (FDA) [2], to provide low- dimensional projections at low computational costs. However, recent evidence suggests that the use of nonlinear embeddings of data originally lying in low-dimensional manifolds can be more effective [3]. A number of unsupervised spectral embedding methods have been proposed [4]–[6], along with their linear out-of-sample extensions [7]–[10]. These preserve Manuscript received July 10, 2012; revised November 10, 2012 and Feb- ruary 24, 2013; accepted April 30, 2013. Date of publication June 12, 2013; date of current version September 27, 2013. This work was supported by CONACyT under Scholarship 19629. E. Rodriguez-Martinez, T. Mu, and J. Y. Goulermas are with the Department of Electrical Engineering and Electronics, The University of Liverpool, Liver- pool L69 3GJ, U.K. (e-mail: edrom@liverpool.ac.uk; t.mu@liverpool.ac.uk; j.y.goulermas@liverpool.ac.uk). J. Jiang is with the School of Computer Science and Technology, Tianjin University, Tianjin 300072, China, and also with the Department of Computing, University of Surrey, Guildford GU2 7XH, U.K. (e-mail: jmjiang@tju.edu.cn). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2013.2261613 certain characteristics of the original high-dimensional space. For instance, the locality preserving projections (LPP) method [8] and the orthogonal LPP (OLPP) [7] retain the aggregate pairwise proximity information based on local neighborhood graphs, while the orthogonal neighborhood pre- serving projections (ONPP) [7] keep the reconstruction error from neighboring samples. Nevertheless, in an unsupervised setup, neighboring points near the class boundaries may get projected to the wrong class, and this often increases misclassification rates. As such, several supervised spectral embedding (SSE) alternatives are proposed to alleviate this problem. From these, the methods closely related to FDA [11]– [16] use the between- and within-class information to restrict the embeddings, whereas another class modifies the proximity definition to incorporate the label information through various other formulations [17]–[24]. Despite the existence of a wide range of current SSE methods, according to no free lunch analyses [25] there is no single method that can be optimal for all classification problems. In practice, different alternatives exist for choosing an SSE method for a given classification task. One possibility is the selection of the best performing SSE method among currently existing ones. Such a selection process could be modeled as a computationally expensive grid search over all existing models, involving parameter training for each individual model. Although better search techniques exist to ease the computational burden of direct search [26], [27], the selected model may be suboptimal because of likely assumptions it makes regarding the characteristics of the problem and data. Another alternative is to specifically design a suitable SSE method for the problem at hand. This task may need the involvement of human experts to analyze the data, characterize the problem and eventually propose a new mathematical model. In this paper, we provide a radically different human- competitive alternative to the design of bespoke SSE methods, which can also be envisaged as a systemic distance metric learning approach. We pose the design process as a complex inference task, in which the optimal model is learnt solely from the dataset at hand via the means of a genetic programming (GP)-based model search. This GP search relies on a very novel encoding scheme expressing each potential model as a set of similarity functions, which are then used to construct the characteristic weighting matrix for the classic embedding optimization problem. To maintain a very broad solution space of possible models that our algorithm can create, we employ 2162-237X © 2013 IEEE