Neural Networks 78 (2016) 65–74 Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet 2016 Special Issue A decentralized training algorithm for Echo State Networks in distributed big data applications Simone Scardapane a, , Dianhui Wang b , Massimo Panella a a Department of Information Engineering, Electronics and Telecommunications (DIET), ‘‘Sapienza’’ University of Rome, Via Eudossiana 18, 00184 Rome, Italy b Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC 3086, Australia article info Article history: Available online 18 August 2015 Keywords: Recurrent neural network Echo State Network Distributed learning Alternating Direction Method of Multipliers Big data abstract The current big data deluge requires innovative solutions for performing efficient inference on large, heterogeneous amounts of information. Apart from the known challenges deriving from high volume and velocity, real-world big data applications may impose additional technological constraints, including the need for a fully decentralized training architecture. While several alternatives exist for training feed- forward neural networks in such a distributed setting, less attention has been devoted to the case of decentralized training of recurrent neural networks (RNNs). In this paper, we propose such an algorithm for a class of RNNs known as Echo State Networks. The algorithm is based on the well-known Alternating Direction Method of Multipliers optimization procedure. It is formulated only in terms of local exchanges between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully centralized implementation, in terms of speed, efficiency and generalization accuracy. © 2015 Elsevier Ltd. All rights reserved. 1. Introduction With 2.5 quintillion bytes of data generated every day, we are undoubtedly in an era of ‘big data’ (Wu, Zhu, Wu, & Ding, 2014). Amidst the challenges put forth to the machine learning commu- nity by this big data deluge, much effort has been devoted to ef- ficiently analyze large amounts of data by exploiting parallel and concurrent infrastructures (Cevher, Becker, & Schmidt, 2014; Chu, Kim, Lin, & Yu, 2007), and to take advantage of its possibly struc- tured nature (Bakir, 2007). In multiple real world applications, however, the main issue is given by the overall decentralized na- ture of the data. In what we refer to as ‘data-distributed learn- ing’ (Scardapane, Wang, Panella, & Uncini, 2015a), training data is not available on a centralized location, but large amounts of it are distributed throughout a network of interconnected agents (e.g. computers in a peer-to-peer (P2P) network). Practically, a so- lution relying on a centralized controller may be technologically unsuitable, since it can introduce a single point of failure, and it Corresponding author. Tel.: +39 06 44585495; fax: +39 06 4873300. E-mail addresses: simone.scardapane@uniroma1.it (S. Scardapane), dh.wang@latrobe.edu.au (D. Wang), massimo.panella@uniroma1.it (M. Panella). is prone to communication bottlenecks. Additionally, training data may not be allowed to be exchanged throughout the nodes, 1 either for its size (as is typical in big data applications), or because par- ticular privacy concerns are present (Verykios et al., 2004). Hence, the agents must agree on a single learned model (such as a specific neural network’s topology and weights) by relying only on their data and on local communication between them. In the words of Wu et al. (2014), this can informally be understood as ‘a number of blind men [...] trying to size up a giant elephant ’, where the giant ele- phant refers to the big data, and the blind men are the agents in the network. Although this analogy referred in general to data mining with big data, it is a fitting metaphor for the data-distributed learn- ing setting considered in this paper, which is graphically depicted in Fig. 1. With respect to neural-like architectures, several decentral- ized training algorithms have been investigated in the last few years. This includes distributed strategies for training standard multilayer perceptrons with back-propagation (Georgopoulos & Hasler, 2014), support vector machines (Forero, Cano, & Giannakis, 1 In the rest of the paper, we use the terms ‘agent’ and ‘node’ interchangeably, to refer to a single component of a network as the one in Fig. 1. http://dx.doi.org/10.1016/j.neunet.2015.07.006 0893-6080/© 2015 Elsevier Ltd. All rights reserved.