Mean square convergence analysis for kernel least mean square algorithm Badong Chen n , Songlin Zhao, Pingping Zhu, Jose ´ C. Prı ´ncipe Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA article info Article history: Received 8 October 2011 Received in revised form 12 March 2012 Accepted 12 April 2012 Available online 24 April 2012 Keywords: Kernel adaptive ﬁlter Kernel least mean square Energy conservation relation Mean square convergence abstract In this paper, we study the mean square convergence of the kernel least mean square (KLMS). The fundamental energy conservation relation has been established in feature space. Starting from the energy conservation relation, we carry out the mean square convergence analysis and obtain several important theoretical results, including an upper bound on step size that guarantees the mean square convergence, the theoretical steady-state excess mean square error (EMSE), an optimal step size for the fastest convergence, and an optimal kernel size for the fastest initial convergence. Monte Carlo simulation results agree with the theoretical analysis very well. & 2012 Elsevier B.V. All rights reserved. 1. Introduction During the past decade the kernel methods [1] have gained increasing popularity, driven by their successful applications in areas such as machine learning, signal processing and biomedical engineering, thanks to their advantages such as universal nonlinear approximation, convexity in hypothesis space, and simplicity and ele- gance in implementation of nonlinear algorithms (by kernel trick, one can easily compute the inner product in the feature space without an explicit mapping from the input space to the feature space). Typical kernel based algorithms include support vector machine (SVM) [2], kernel regularization network [3], kernel principal com- ponent analysis (KPCA) [4], etc. Recently, a family of online kernel-learning algorithms, known as the kernel adaptive ﬁltering algorithms [5], became an emerging area of research. The kernel adaptive ﬁlters are developed in reproducing kernel Hilbert spaces (RKHS) [6,7], by using the linear structure of this space to implement well-established linear adaptive algorithms and to obtain nonlinear ﬁlters in the input space. These algorithms include the kernel least mean square (KLMS) [8], kernel afﬁne projection algorithms (KAPA) [9], kernel recursive least squares (KRLS) [10], extended kernel recursive least squares (EX-KRLS) [11], etc. Among these algorithms, the KLMS is the simplest one which naturally creates a growing radial-basis function (RBF) network, learning the network topology and adapting free para- meters directly from training data. The main bottleneck of the kernel adaptive ﬁltering algorithms is their growing structure with each sample, which poses both computa- tional as well as memory issues especially for continuous adaptation scenarios. In order to curb the growth of the networks, a variety of sparsiﬁcation techniques have also been proposed. Existing sparsiﬁcation criteria include the novelty criterion [12], approximate linear dependency (ALD) criterion [10], the surprise criterion [13], etc. The convergence behavior is another key aspect of the kernel adaptive ﬁlters. For classical linear adaptive ﬁlters, the convergence analysis has been extensively studied [14,15]. In this direction, we mention here the works of Sayed [15], Al-Naffouri and Sayed [16–18], and Yousef and Sayed [19], whose approach is based on the fundamental Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/sigpro Signal Processing 0165-1684/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2012.04.007 n Corresponding author. E-mail address: chenbd04@mails.tsinghua.edu.cn (B. Chen). Signal Processing 92 (2012) 2624–2632