A Distributed Approach for Machine Learning in Large Scale Manufacturing Systems Cristina Morariu, Silviu Răileanu, Theodor Borangiu, Florin Anton Dept. of Automation and Applied Informatics, Univ. Politehnica of Bucharest, Romania {cristina.morariu; theodor.borangiu; silviu.raileanu; florin.anton}@cimr.pub.ro Abstract. Large scale manufacturing systems are capable to execute manu- facturing operations across multiple product batches by coordinating many shop floor actors. Monitoring and processing in real time the information flow from these systems becomes an essential part in optimizing and detect- ing faults that might affect the production schedule. This paper proposes an architecture that uses big data concepts and map-reduce algorithms to pro- cess the information streams in large scale manufacturing systems, focusing on energy consumptions aggregated at various layers. Once the information is aggregated in logical streams and consolidated based on relevant meta- data, a neural network is trained and used to learn historical patterns in data on each layer. This novel approach also allows accurate forecasting of the energy consumption patterns during the production cycle by using Long Short Term Memory neural networks. The paper presents a practical exam- ple on how the map reduce algorithm can be implemented and how repeti- tive patterns in energy consumption can be learned. Keywords: Big data, machine learning, LSTM, energy consumption, fore- casting, neural networks 1 Introduction The rapid digitalization and smart integration of shop floor devices and control software systems caused an explosion in the data points available in large scale manufacturing systems. The degree at which enterprises are able to capture value from processing this data and to extract useful insights from it represents a differ- entiating factor on short and medium term development and optimization of the processes that drive the manufacturing operations. There are three important di- mensions when processing data: aggregating at the right logical levels when data originates from multiple sources, aligning the data streams in normalized time intervals and extracting insights from real time data streams. All these dimensions should be considered in the context of scale. In other words, the processing of this