Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August 12-17, 2007
Search Strategies Guided by the Evidence for the Selection of Basis
Functions in Regression
Ignacio Barrio, Enrique Romero, and Lluis Belanche
Abstract- This work addresses the problem of selecting a
subset of basis functions for a model linear in the parameters
for regression tasks. Basis functions from a set of candidates are
explicitly selected with search methods coming from the feature
selection field. Following approximate Bayesian inference, the
search is guided by the evidence. The tradeoff between model
complexity and computational cost can be controlled by choos-
ing the search strategy. The experimental results show that,
under mild assumptions, compact and very competitive models
are usually found.
I. INTRODUCTION
In regression tasks we are given a data set of input vectors
{ xn} 1 N and corresponding target values {tn}N
1,
where
tn
e R. The objective is to infer a function y(x) that
underlies the training data and makes good predictions on
unseen input vectors. A very common choice is obtained by
a linear model with m < N fixed basis functions qi:
m
y(X; W)
= Zwi i(X),
i=l
where w = (w1, w2, ..,
1
m)T are the model parameters. Since
the model is linear in the parameters, these are easy to
estimate and the main problem lies on the selection of the m
basis functions (m is unknown a priori) from a dictionary.
In machine learning, using a dictionary of basis functions
centered at the input data usually gives good results [1].
This problem has been mainly tackled in two different
ways, according to the implicit or explicit nature of the
selection process. In implicit selection methods, the model
with the whole set of basis functions is considered and then
the parameters are computed in such a way that many of them
become zero. This is the case of Support Vector Machines
(SVM) [2],
Basis Pursuit (BP) [3],
Least Absolute Shrinkage
and Selection Operator (LASSO) [4] and Relevance Vector
Machines (RVM) [5]. In explicit selection methods a search
is carried out guided by the minimization of some cost
function. This category includes Matching Pursuits (MP)
[6], Orthogonal Least Squares (OLS) [7], Kernel Matching
Pursuit (KMP) [8], or some Gaussian process approxima-
tions [9], [10], among others. All these methods use forward
selection as the search strategy.
Explicit selection methods use two criteria: an objective
(or cost) function that conducts the search (e.g., the training
set sum-of-squares error) and an evaluation function to check
model performance, eventually used to stop the process (e.g.,
The authors are with the Soft Computing Group, Universitat
Politecnica de Catalunya, Barcelona, Spain (email: {ibarrio; eromero;
belanche} Nlsi.upc.edu).
the validation set sum-of-squares error). The evaluation is
commonly used to avoid overfitting. This duality hinders the
use of more powerful search strategies, that would minimize
much the first criterion but not necessarily the second one.
The choice of a proper objective function is then encouraged
if powerful search strategies are to be used.
In a Bayesian setting, under the use of certain priors,
there is no need to limit the size of the network to avoid
overfitting [11]. However, simpler models are more benefitial
for computational reasons. Gaussian processes have been
approximated with a subset of regressors [12] and the subset
has been selected with forward selection maximizing the
marginal likelihood [10], being both the objective and the
evaluation function. In the context of linear models, the use
of the evidence has been suggested to compare different
models given that it penalizes complex models and there is
(anti)correlation between model evidence and generalization
error [13], [14].
In this work we propose an explicit search guided by the
evidence for the model. The evidence is both the search
objective function and the evaluation function. Several algo-
rithms borrowed from the feature selection field are used as
search methods. A fast implementation of the whole process
is developed. An experimental study shows that these Search
Strategies Guided by the Evidence (SSGE) find compact
models very competitive with other state-of-the-art tech-
niques such as SVMs and RVMs. More powerful SSGE tend
to find more compact models than simpler ones with slightly
worse prediction accuracy. By choosing the search strategy
the resulting model complexity and the computational cost
can be controlled. This control is not possible for SVM and
RVM.
The rest of this work is organized as follows. Section II
reviews a Bayesian approach for regression with linear
models. Section III enumerates some common feature se-
lection search strategies. Section IV presents the SSGE. An
experimental study comparing different methods is carried
out in Section V and we discuss the results obtained in
Section VI. Finally we conclude the paper in Section VII.
II. A BAYESIAN APPROACH FOR LINEAR MODELS
We briefly review the noisy interpolation problem and
the three levels of inference in a Bayesian framework [14].
The first one considers the posterior distribution over the
parameters, the second one adapts the hyperparameters that
control the parameters and the third one allows the compari-
son of different models. We assume the targets to be deviated
from the underlying function by independent additive noise
1-4244-1 380-X/07/$25.00 ©2007 IEEE
© 2007 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to
servers or lists, or reuse of any copyrighted component of this work in other works. DOI 10.1109/IJCNN.2007.4370996