IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 12, DECEMBER 2000 3559 A Framework for Multiscale and Hybrid RKHS-Based Approximators Michaël A. van Wyk, Member, IEEE, and Tariq S. Durrani, Fellow, IEEE Abstract—A generalized framework for deriving multiscale and hybrid functionally expanded approximators that are linear in the adjustable weights is presented. The basic idea here is to define one or more appropriate function spaces and then to impose a geometric structure on these to obtain reproducing kernel Hilbert spaces (RKHSs) [1]. The weight identification problem is formu- lated as a minimum norm optimization problem that produces an approximation network structure that comprises a linear weighted sum of displaced reproducing kernels fed by the input signals. Ex- amples of the application of this framework are discussed. Results of numerical experiments are presented. Index Terms—Approximation networks, neural networks, nonlinear functional approximation, reproducing kernel Hilbert spaces. I. INTRODUCTION R ECENTLY, several authors have studied the problem of identification and approximation of nonlinear mappings using approximation networks that are linear in the parameters. A generalized Fock space framework for the analysis and identification of nonlinear mappings described by Volterra series mappings and nonlinear dynamical systems of the type described by Volterra series operators was proposed by de Figueiredo et al. [2], [3]. Specifically, this identification procedure is based on the construction of a RKHS, namely, a symmetric Fock space for the Volterra series mapping (operator, respectively). The significance of de Figueiredo’s framework is that it produces a representation of the nonlinear mapping or system that may be viewed as a three-layer linear-in-the-pa- rameters approximator with the output layer consisting of a single summation. This makes these approximators suitable for adaptive learning using linear adaptive filter algorithms. Poggio and Girosi [6] developed a framework for the approx- imation of nonlinear mappings based on regularization tech- niques [7]. Their approach leads to a class of three-layer ap- proximation networks called regularization networks, which in- cludes radial basis functions (RBFs) networks as a special case. First, the regularization functional for the given approximation problem is constructed. It contains a differential operator that Manuscript received May 25, 1999; revised September 8, 2000. This work was supported by the U.S. Navy under Contract N00014-96-11281. The asso- ciate editor coordinating the review of this paper and approving it for publication was Prof. Colin F. N. Cowan. M. A. van Wyk is with the Department of Electrical and Electronic Engi- neering, Rand Afrikaans University, Johannesburg, South Africa. T. S. Durrani is with the Signal Processing Division, Department of Elec- trical and Electronic Engineering, the University of Strathclyde, Glasgow, U.K. (e-mail: t.durrani@eee.strath.ac.uk). Publisher Item Identifier S 1053-587X(00)10249-1. embodies the a priori information about the solution. Minimiza- tion of this functional with respect to the approximating function leads to the associated Euler–Lagrange equation, which is a par- tial differential equation, the solution of which may be written as an integral transform. This integral transform turns out to be a weighted sum of displaced replicas of the Green’s function of the adjoint operator composed with the above-mentioned dif- ferential operator. These approximation networks are therefore linear in the weights. In [6], Poggio and Girosi also present an adaptive learning algorithm that adapts both the weights and the kernel centers in search of the optimal solution. The main drawback of regularization networks is that complex differen- tial equations have to be solved to obtain the kernel function (or activation function) for the approximation problem. Yet another approach to the identification and approximation of nonlinear mappings is that initiated by Vapnik in the 1960’s. This theoretical framework is known as statistical learning theory [8]. This theory was first developed for two-class classifiers. Here, the idea is to map the available training examples located in the input space into a higher dimensional vector space called the feature space in such a way that the problem becomes linearly separable there. Thereafter, a hyper- plane is fitted in the feature space to separate the two classes maximally. The expression for this nonlinear classifier has a three-layer structure. Discarding the binary decision element in the output layer exposes the linear-in-the-weights structure that remains. Furthermore, only those centers closest to the separating hyperplane (the so-called support vectors) appear in the expression of the classifier. This contributes to a good generalization ability. All of this is a direct consequence of the Kuhn–Tucker complementary conditions in optimization theory [9]. The approximation networks implementing these classifiers are referred to as support vector machines (SVMs). An important insight gained from this approach is that the mapping used to map from the input space to the feature space bears a direct relation to the kernel functions used in these classifiers. Recently, statistical learning theory has been extended to include the problem of function approximation (see [10] and references therein). The greatest concern with statistical learning theory is the computational complexity of the quadratic programming problem that needs to be solved to obtain the weight vector for the SVM. Although both regularization networks and statistical learning theory include aspects of the RKHSs neither include the perspective addressed by our approach. This approach, which generalizes the ideas of de Figueiredo [5], provides a powerful geometric interpretation that is not obtainable with either regularization networks or statistical learning theory. 1053–587X/00$10.00 © 2000 IEEE