Delay Prediction System for Large-Scale Railway Networks based on Big Data Analytics Luca Oneto 1?? , Emanuele Fumeo 1 , Giorgio Clerico 1 , Renzo Canepa 2 , Federico Papa 3 , Carlo Dambra 3 , Nadia Mazzino 3 , and Davide Anguita 1 DIBRIS - University of Genoa, Via Opera Pia 11A, Genova, I-16145, Italy {luca.oneto,emanuele.fumeo,g.clerico,davide.anguita}@unige.it Rete Ferroviaria Italiana S.p.A., Via Don Vincenzo Minetti 6/5, 16126 Genoa, Italy r.canepa@rfi.it Ansaldo STS S.p.A., Via Paolo Mantovani 3-5, 16151 Genoa, Italy {federico.papa,carlo.dambra,nadia.mazzino}@ansaldo-sts.com Abstract. State-of-the-art train delay prediction systems do not exploit histori- cal train movements data collected by the railway information systems, but they rely on static rules built by expert of the railway infrastructure based on classi- cal univariate statistic. The purpose of this paper is to build a data-driven train delay prediction system for large-scale railway networks which exploits the most recent Big Data technologies and learning algorithms. In particular, we propose a fast learning algorithm for predicting train delays based on the Extreme Learn- ing Machine that fully exploits the recent in-memory large-scale data processing technologies. Our system is able to rapidly extract nontrivial information from the large amount of data available in order to make accurate predictions about different future states of the railway network. Results on real world data coming from the Italian railway network show that our proposal is able to improve the current state-of-the-art train delay prediction systems. Keywords: Condition–Based Maintenance, Naval Propulsion Plant, Machine Learn- ing, Publicly Distributed Dataset 1 Introduction Big Data Analytics is one of the current trending research interests in the context of railway transportation systems. Indeed, many aspects of the railway world can greatly benefit from new technologies and methodologies able to collect, store, process, ana- lyze and visualize large amounts of data [34, 37], e.g. condition based maintenance of railway assets [9, 25], alarm detection with wireless sensor networks [20], passenger in- formation systems [23], risk analysis [8], and the like. In particular, this paper focuses on predicting train delays in order to improve traffic management and dispatching using Big Data Analytics, scaling to large railway networks. ?? This research has been supported by the European Union through the projects Capacity4Rail (European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement 605650) and In2Rail (European Union’s Horizon 2020 research and innovation programme under grant agreement 635900).