IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 3, MARCH 2012 425
Adaptive Learning in Complex Reproducing Kernel
Hilbert Spaces Employing Wirtinger’s Subgradients
Pantelis Bouboulis, Member, IEEE, Konstantinos Slavakis, Member, IEEE, and Sergios Theodoridis, Fellow, IEEE
Abstract—This paper presents a wide framework for non-
linear online supervised learning tasks in the context of complex
valued signal processing. The (complex) input data are mapped
into a complex reproducing kernel Hilbert space (RKHS), where
the learning phase is taking place. Both pure complex kernels
and real kernels (via the complexification trick) can be employed.
Moreover, any convex, continuous and not necessarily differen-
tiable function can be used to measure the loss between the
output of the specific system and the desired response. The only
requirement is the subgradient of the adopted loss function to be
available in an analytic form. In order to derive analytically
the subgradients, the principles of the (recently developed)
Wirtinger’s calculus in complex RKHS are exploited. Further-
more, both linear and widely linear (in RKHS) estimation filters
are considered. To cope with the problem of increasing memory
requirements, which is present in almost all online schemes in
RKHS, the sparsification scheme, based on projection onto closed
balls, has been adopted. We demonstrate the effectiveness of the
proposed framework in a non-linear channel identification task, a
non-linear channel equalization problem and a quadrature phase
shift keying equalization scheme, using both circular and non
circular synthetic signal sources.
Index Terms— Adaptive kernel learning, complex kernels,
projection, subgradient, widely linear estimation, Wirtinger’s
calculus.
I. I NTRODUCTION
K
ERNEL based methods have been successfully applied
in many classification, regression, and estimation tasks
in a variety of scientific domains ranging from pattern recog-
nition, image and signal processing to biology and nuclear
physics [1]–[24]. Their appeal lies mainly on the solid and
efficient mathematical background which they rely upon: the
theory of reproducing Kernel Hilbert spaces (RKHS) [25],
[26]. The main advantage of mobilizing this powerful tool of
RKHS is that it offers an elegant tactic to transform a nonlinear
task (in a low dimensional space) into a linear one, that is
Manuscript received April 26, 2011; revised September 28, 2011; accepted
December 3, 2011. Date of publication January 9, 2012; date of current
version February 29, 2012.
P. Bouboulis is with the Department of Informatics and Telecom-
munications, University of Athens, Athens 15784, Greece (e-mail:
pbouboulis@sch.gr).
K. Slavakis is with the Department of Telecommunications Science and
Technology, University of Peloponnese, Tripolis 22100, Greece (e-mail:
slavakis@uop.gr).
S. Theodoridis is with the Department of Informatics and Telecommu-
nications, University of Athens, Athens 15784, Greece. He is also with
Research Academic Computer Technology Institute, Patra 26504, Greece
(e-mail: stheodor@di.uoa.gr).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2011.2179810
performed in a high dimensional (possible infinite) space,
and which can be solved by employing an easier “algebra.”
Usually, this process is described through the popular kernel
trick [1], [2].
“Given an algorithm, which can be formulated in terms of
dot (inner) products, one can construct an alternative algorithm
by replacing each one of the dot products with a positive
definite kernel κ .”
Although this trick works well for most applications, it con-
ceals the basic mathematical steps that underlie the procedure,
which are essential if one seeks a deeper understanding of
the problem. These steps are: 1) map the finite dimension-
ality input data from the input space F (usually F ⊂ R
ν
)
into a higher dimensionality (possibly infinite) RKHS H;
2) perform a linear processing (e.g., adaptive filtering) on
the mapped data in H. The procedure is equivalent with a
non-linear processing (non-linear filtering) in F . The specific
choice of the kernel κ defines, implicitly, an RKHS with an
appropriate inner product. Moreover, the specific choice of the
kernel defines the type of nonlinearity that underlies the model
to be used.
Undeniably, the flagship of the so called kernel methods is
the popular support vector machines paradigm [1]–[4]. This
was developed by Vapnik and Chervonenkis in the sixties and
in its original form was a linear classifier. However, with
the incorporation of kernels it became a powerful nonlinear
processing tool with excellent generalization properties, as it
is substantiated by strong theoretical arguments in the context
of the statistical learning theory [3], and it has been verified
in practice, e.g., [2].
Motivated mainly by the success of SVMs in classification
problems, a large number of kernel based methods emerged
in various domains. However, most of these methods relate
to batch processing, where all necessary data are available
beforehand. Over the last five years, significant efforts have
been devoted to the development of online kernel methods
for adaptive learning (e.g., adaptive filtering) [5]–[12], where
the data arrive sequentially. However, all the aforementioned
kernel methods (batch and online) were targeted for applica-
tions of real data sequences.
Complex-valued signals arise frequently in applications
as diverse as communications, biomedicine, radar, etc. The
complex domain not only provides a convenient and elegant
representation for these signals, but also a natural way to
preserve their characteristics and to handle transformations
that need to be performed. Therefore, it is natural to wonder
whether we should be able to apply the machinery of kernels to
2162–237X/$31.00 © 2012 IEEE