Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August 12-17, 2007 Search Strategies Guided by the Evidence for the Selection of Basis Functions in Regression Ignacio Barrio, Enrique Romero, and Lluis Belanche Abstract- This work addresses the problem of selecting a subset of basis functions for a model linear in the parameters for regression tasks. Basis functions from a set of candidates are explicitly selected with search methods coming from the feature selection field. Following approximate Bayesian inference, the search is guided by the evidence. The tradeoff between model complexity and computational cost can be controlled by choos- ing the search strategy. The experimental results show that, under mild assumptions, compact and very competitive models are usually found. I. INTRODUCTION In regression tasks we are given a data set of input vectors { xn} 1 N and corresponding target values {tn}N 1, where tn e R. The objective is to infer a function y(x) that underlies the training data and makes good predictions on unseen input vectors. A very common choice is obtained by a linear model with m < N fixed basis functions qi: m y(X; W) = Zwi i(X), i=l where w = (w1, w2, .., 1 m)T are the model parameters. Since the model is linear in the parameters, these are easy to estimate and the main problem lies on the selection of the m basis functions (m is unknown a priori) from a dictionary. In machine learning, using a dictionary of basis functions centered at the input data usually gives good results [1]. This problem has been mainly tackled in two different ways, according to the implicit or explicit nature of the selection process. In implicit selection methods, the model with the whole set of basis functions is considered and then the parameters are computed in such a way that many of them become zero. This is the case of Support Vector Machines (SVM) [2], Basis Pursuit (BP) [3], Least Absolute Shrinkage and Selection Operator (LASSO) [4] and Relevance Vector Machines (RVM) [5]. In explicit selection methods a search is carried out guided by the minimization of some cost function. This category includes Matching Pursuits (MP) [6], Orthogonal Least Squares (OLS) [7], Kernel Matching Pursuit (KMP) [8], or some Gaussian process approxima- tions [9], [10], among others. All these methods use forward selection as the search strategy. Explicit selection methods use two criteria: an objective (or cost) function that conducts the search (e.g., the training set sum-of-squares error) and an evaluation function to check model performance, eventually used to stop the process (e.g., The authors are with the Soft Computing Group, Universitat Politecnica de Catalunya, Barcelona, Spain (email: {ibarrio; eromero; belanche} Nlsi.upc.edu). the validation set sum-of-squares error). The evaluation is commonly used to avoid overfitting. This duality hinders the use of more powerful search strategies, that would minimize much the first criterion but not necessarily the second one. The choice of a proper objective function is then encouraged if powerful search strategies are to be used. In a Bayesian setting, under the use of certain priors, there is no need to limit the size of the network to avoid overfitting [11]. However, simpler models are more benefitial for computational reasons. Gaussian processes have been approximated with a subset of regressors [12] and the subset has been selected with forward selection maximizing the marginal likelihood [10], being both the objective and the evaluation function. In the context of linear models, the use of the evidence has been suggested to compare different models given that it penalizes complex models and there is (anti)correlation between model evidence and generalization error [13], [14]. In this work we propose an explicit search guided by the evidence for the model. The evidence is both the search objective function and the evaluation function. Several algo- rithms borrowed from the feature selection field are used as search methods. A fast implementation of the whole process is developed. An experimental study shows that these Search Strategies Guided by the Evidence (SSGE) find compact models very competitive with other state-of-the-art tech- niques such as SVMs and RVMs. More powerful SSGE tend to find more compact models than simpler ones with slightly worse prediction accuracy. By choosing the search strategy the resulting model complexity and the computational cost can be controlled. This control is not possible for SVM and RVM. The rest of this work is organized as follows. Section II reviews a Bayesian approach for regression with linear models. Section III enumerates some common feature se- lection search strategies. Section IV presents the SSGE. An experimental study comparing different methods is carried out in Section V and we discuss the results obtained in Section VI. Finally we conclude the paper in Section VII. II. A BAYESIAN APPROACH FOR LINEAR MODELS We briefly review the noisy interpolation problem and the three levels of inference in a Bayesian framework [14]. The first one considers the posterior distribution over the parameters, the second one adapts the hyperparameters that control the parameters and the third one allows the compari- son of different models. We assume the targets to be deviated from the underlying function by independent additive noise 1-4244-1 380-X/07/$25.00 ©2007 IEEE © 2007 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. DOI 10.1109/IJCNN.2007.4370996