IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 3, MARCH 2012 425 Adaptive Learning in Complex Reproducing Kernel Hilbert Spaces Employing Wirtinger’s Subgradients Pantelis Bouboulis, Member, IEEE, Konstantinos Slavakis, Member, IEEE, and Sergios Theodoridis, Fellow, IEEE Abstract—This paper presents a wide framework for non- linear online supervised learning tasks in the context of complex valued signal processing. The (complex) input data are mapped into a complex reproducing kernel Hilbert space (RKHS), where the learning phase is taking place. Both pure complex kernels and real kernels (via the complexification trick) can be employed. Moreover, any convex, continuous and not necessarily differen- tiable function can be used to measure the loss between the output of the specific system and the desired response. The only requirement is the subgradient of the adopted loss function to be available in an analytic form. In order to derive analytically the subgradients, the principles of the (recently developed) Wirtinger’s calculus in complex RKHS are exploited. Further- more, both linear and widely linear (in RKHS) estimation filters are considered. To cope with the problem of increasing memory requirements, which is present in almost all online schemes in RKHS, the sparsification scheme, based on projection onto closed balls, has been adopted. We demonstrate the effectiveness of the proposed framework in a non-linear channel identification task, a non-linear channel equalization problem and a quadrature phase shift keying equalization scheme, using both circular and non circular synthetic signal sources. Index Terms— Adaptive kernel learning, complex kernels, projection, subgradient, widely linear estimation, Wirtinger’s calculus. I. I NTRODUCTION K ERNEL based methods have been successfully applied in many classification, regression, and estimation tasks in a variety of scientific domains ranging from pattern recog- nition, image and signal processing to biology and nuclear physics [1]–[24]. Their appeal lies mainly on the solid and efficient mathematical background which they rely upon: the theory of reproducing Kernel Hilbert spaces (RKHS) [25], [26]. The main advantage of mobilizing this powerful tool of RKHS is that it offers an elegant tactic to transform a nonlinear task (in a low dimensional space) into a linear one, that is Manuscript received April 26, 2011; revised September 28, 2011; accepted December 3, 2011. Date of publication January 9, 2012; date of current version February 29, 2012. P. Bouboulis is with the Department of Informatics and Telecom- munications, University of Athens, Athens 15784, Greece (e-mail: pbouboulis@sch.gr). K. Slavakis is with the Department of Telecommunications Science and Technology, University of Peloponnese, Tripolis 22100, Greece (e-mail: slavakis@uop.gr). S. Theodoridis is with the Department of Informatics and Telecommu- nications, University of Athens, Athens 15784, Greece. He is also with Research Academic Computer Technology Institute, Patra 26504, Greece (e-mail: stheodor@di.uoa.gr). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2011.2179810 performed in a high dimensional (possible infinite) space, and which can be solved by employing an easier “algebra.” Usually, this process is described through the popular kernel trick [1], [2]. “Given an algorithm, which can be formulated in terms of dot (inner) products, one can construct an alternative algorithm by replacing each one of the dot products with a positive definite kernel κ .” Although this trick works well for most applications, it con- ceals the basic mathematical steps that underlie the procedure, which are essential if one seeks a deeper understanding of the problem. These steps are: 1) map the finite dimension- ality input data from the input space F (usually F R ν ) into a higher dimensionality (possibly infinite) RKHS H; 2) perform a linear processing (e.g., adaptive filtering) on the mapped data in H. The procedure is equivalent with a non-linear processing (non-linear filtering) in F . The specific choice of the kernel κ defines, implicitly, an RKHS with an appropriate inner product. Moreover, the specific choice of the kernel defines the type of nonlinearity that underlies the model to be used. Undeniably, the flagship of the so called kernel methods is the popular support vector machines paradigm [1]–[4]. This was developed by Vapnik and Chervonenkis in the sixties and in its original form was a linear classifier. However, with the incorporation of kernels it became a powerful nonlinear processing tool with excellent generalization properties, as it is substantiated by strong theoretical arguments in the context of the statistical learning theory [3], and it has been verified in practice, e.g., [2]. Motivated mainly by the success of SVMs in classification problems, a large number of kernel based methods emerged in various domains. However, most of these methods relate to batch processing, where all necessary data are available beforehand. Over the last five years, significant efforts have been devoted to the development of online kernel methods for adaptive learning (e.g., adaptive filtering) [5]–[12], where the data arrive sequentially. However, all the aforementioned kernel methods (batch and online) were targeted for applica- tions of real data sequences. Complex-valued signals arise frequently in applications as diverse as communications, biomedicine, radar, etc. The complex domain not only provides a convenient and elegant representation for these signals, but also a natural way to preserve their characteristics and to handle transformations that need to be performed. Therefore, it is natural to wonder whether we should be able to apply the machinery of kernels to 2162–237X/$31.00 © 2012 IEEE