IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 12, DECEMBER 2000 3559
A Framework for Multiscale and Hybrid RKHS-Based
Approximators
Michaël A. van Wyk, Member, IEEE, and Tariq S. Durrani, Fellow, IEEE
Abstract—A generalized framework for deriving multiscale and
hybrid functionally expanded approximators that are linear in the
adjustable weights is presented. The basic idea here is to define
one or more appropriate function spaces and then to impose a
geometric structure on these to obtain reproducing kernel Hilbert
spaces (RKHSs) [1]. The weight identification problem is formu-
lated as a minimum norm optimization problem that produces an
approximation network structure that comprises a linear weighted
sum of displaced reproducing kernels fed by the input signals. Ex-
amples of the application of this framework are discussed. Results
of numerical experiments are presented.
Index Terms—Approximation networks, neural networks,
nonlinear functional approximation, reproducing kernel Hilbert
spaces.
I. INTRODUCTION
R
ECENTLY, several authors have studied the problem of
identification and approximation of nonlinear mappings
using approximation networks that are linear in the parameters.
A generalized Fock space framework for the analysis and
identification of nonlinear mappings described by Volterra
series mappings and nonlinear dynamical systems of the
type described by Volterra series operators was proposed by
de Figueiredo et al. [2], [3]. Specifically, this identification
procedure is based on the construction of a RKHS, namely, a
symmetric Fock space for the Volterra series mapping (operator,
respectively). The significance of de Figueiredo’s framework
is that it produces a representation of the nonlinear mapping
or system that may be viewed as a three-layer linear-in-the-pa-
rameters approximator with the output layer consisting of a
single summation. This makes these approximators suitable for
adaptive learning using linear adaptive filter algorithms.
Poggio and Girosi [6] developed a framework for the approx-
imation of nonlinear mappings based on regularization tech-
niques [7]. Their approach leads to a class of three-layer ap-
proximation networks called regularization networks, which in-
cludes radial basis functions (RBFs) networks as a special case.
First, the regularization functional for the given approximation
problem is constructed. It contains a differential operator that
Manuscript received May 25, 1999; revised September 8, 2000. This work
was supported by the U.S. Navy under Contract N00014-96-11281. The asso-
ciate editor coordinating the review of this paper and approving it for publication
was Prof. Colin F. N. Cowan.
M. A. van Wyk is with the Department of Electrical and Electronic Engi-
neering, Rand Afrikaans University, Johannesburg, South Africa.
T. S. Durrani is with the Signal Processing Division, Department of Elec-
trical and Electronic Engineering, the University of Strathclyde, Glasgow, U.K.
(e-mail: t.durrani@eee.strath.ac.uk).
Publisher Item Identifier S 1053-587X(00)10249-1.
embodies the a priori information about the solution. Minimiza-
tion of this functional with respect to the approximating function
leads to the associated Euler–Lagrange equation, which is a par-
tial differential equation, the solution of which may be written
as an integral transform. This integral transform turns out to be
a weighted sum of displaced replicas of the Green’s function of
the adjoint operator composed with the above-mentioned dif-
ferential operator. These approximation networks are therefore
linear in the weights. In [6], Poggio and Girosi also present an
adaptive learning algorithm that adapts both the weights and
the kernel centers in search of the optimal solution. The main
drawback of regularization networks is that complex differen-
tial equations have to be solved to obtain the kernel function (or
activation function) for the approximation problem.
Yet another approach to the identification and approximation
of nonlinear mappings is that initiated by Vapnik in the 1960’s.
This theoretical framework is known as statistical learning
theory [8]. This theory was first developed for two-class
classifiers. Here, the idea is to map the available training
examples located in the input space into a higher dimensional
vector space called the feature space in such a way that the
problem becomes linearly separable there. Thereafter, a hyper-
plane is fitted in the feature space to separate the two classes
maximally. The expression for this nonlinear classifier has a
three-layer structure. Discarding the binary decision element
in the output layer exposes the linear-in-the-weights structure
that remains. Furthermore, only those centers closest to the
separating hyperplane (the so-called support vectors) appear
in the expression of the classifier. This contributes to a good
generalization ability. All of this is a direct consequence of
the Kuhn–Tucker complementary conditions in optimization
theory [9]. The approximation networks implementing these
classifiers are referred to as support vector machines (SVMs).
An important insight gained from this approach is that the
mapping used to map from the input space to the feature
space bears a direct relation to the kernel functions used in
these classifiers. Recently, statistical learning theory has been
extended to include the problem of function approximation
(see [10] and references therein). The greatest concern with
statistical learning theory is the computational complexity of
the quadratic programming problem that needs to be solved to
obtain the weight vector for the SVM.
Although both regularization networks and statistical
learning theory include aspects of the RKHSs neither include
the perspective addressed by our approach. This approach,
which generalizes the ideas of de Figueiredo [5], provides a
powerful geometric interpretation that is not obtainable with
either regularization networks or statistical learning theory.
1053–587X/00$10.00 © 2000 IEEE