A COMPLEX ECHO STATE NETWORK FOR NONLINEAR ADAPTIVE FILTERING Yili. Xia 1 , Danilo P. Mandic 1 , Marc M. Van Hulle 2 and Jose C. Principe 3 1 Department of Electrical and Electronic Engineering Imperial College London, Exhibition Road, London, SW7 2BT, U.K. {yili.xia06, d.mandic}@imperial.ac.uk 2 Laboratorium voor Neurofysiologie K.U.Leuven, Campus Gasthuisberg, Herestraat 49, B-3000 Leuven, Belgium marc@neuro.kuleuven.be 3 Department of Electrical Engineering University of Florida, Gainesville, FL 32611, USA principe@cnel.uﬂ.edu ABSTRACT The operation of Echo State Networks (ESNs) is extended to the complex domain, in order to perform nonlinear complex valued adaptive ﬁltering of nonlinear and nonstationary sig- nals. This is achieved by introducing a nonlinear output layer into an ESN, whereby full adaptivity is provided by introduc- ing an adaptive amplitude into the nonlinear activation func- tion within the output layer of ESN. This allows us to control and track the degree of nonlinearity, which facilitates real- world adaptive ﬁltering applications. Learning algorithms for such ESN are derived, and the beneﬁts of the combination of sparse connections and nonlinear adaptive output layer are illustrated by simulations of both benchmark and real world signals. 1. INTRODUCTION Recent research has illustrated the power of recurrent neu- ral networks (RNNs) as universal approximators for any con- tinuous function on a compact domain [1, 2]. It has also been shown that RNNs are capable of outperforming linear adaptive ﬁltering methods for nonlinear adaptive prediction of real-world data with complex nonlinear dynamics [3, 4]. Compared with the static feedforward networks, RNNs em- ploy rich internal nonlinear dynamics, which makes them of- fer potentially better performance [5], and have been widely used in system identiﬁcation and time series prediction. How- ever, due to their complexity, the main problem within RNNs is computational complexity associated with the updates of the weight matrix. The training and analysis of RNNs are not straightforward [3], and suffer from a variety of problems, such as the slow training resulting from the computational complexity and the possibility of instability. To address some of these problems, a class of discrete-time RNNs, called Echo State Networks (ESNs), have been recently considered [6]. The idea behind ESNs is to separate the RNNs’ architectures into two constituent parts: a recurrent architecture, called the dynamical reservoir, within the internal layer, and a memo- ryless linear output layer, called the readout neuron. The dy- namical reservoir satisﬁes the so called echo state property, which is guaranteed by generating a large random internal network of neurons with a speciﬁc degree of sparsity. By em- ploying a sparse and ﬁxed hidden layer and training only the weights connecting the internal layer to the readout neuron, the ESN approach signiﬁcantly reduces the complexity of the representation. ESNs have been developed for the operation in R, however, due to a number of emerging applications in the complex domain C, it is natural to extend ESNs to this do- main. The ﬁrst extension of this type was proposed in [7] with a linear output mapping and the learning algorithms directly extended to complex domain. However, to perform nonlin- ear adaptive ﬁltering of complex-valued nonlinear and non- stationary signals, the degree of nonlinearity of the standard ESN structure may not be sufﬁcient, due to the linearity of the output mapping and such a relatively simple structure. To that end, for complex ESN from [7], we introduce a nonlinear output layer, that is, a linear mapping followed by a nonlinear activation function, to provide a higher degree of nonlinear- ity within the structure. Whereas the complex ESN can be trained by complex least mean square (CLMS) algorithm [8], the extensions of learning algorithms from R to C for non- linear structures are not trivial, since there are several open problems in the design of complex-valued nonlinear adap- tive ﬁlters, such as the choice of the complex nonlinearities within the model. According to Liouville’s theorem, the only bounded and analytic function in C is constant, and to this end, meromorphic functions have been employed as complex nonlinear activation functions since they are analytic almost everywhere expect for a discrete subset of C. Previous re-