A Microservices Architecture for Machine Learning Assisted Decision Support in a Real-Time Field Sensors Environment Giovanni De Gasperis 1 , Giuseppe Della Penna 1 and Sante Dino Facchini 1 1 Università degli Studi dell’Aquila, Dipartimento di Ingegneria e Scienze dell’Informazione a Matematica, Via Vetoio, L’Aquila, 67100, Italy Abstract In this paper we describe the design and development of a real-world software system that integrates machine learning augmenting a pre-existing remote surveillance framework. Machine learning was embedded as a service in the system, plugged-in between back-end data fux handlers; the system has been redesigned following a microservices architecture to make it scalable and to allow a progressive adoption of the machine learning-powered assistance in the event management process. A case study of the application in an actual security company is analysed and discussed, where we show how this innovation helped human operators to better shield themselves from the "information overloading". Keywords Real-Time Critical Systems, Machine Learning, Big Data, Microservices 1. Introduction In this paper we describe the design and development of a real-world software system that integrates big data analytics and machine learning into a pre-existing remote surveillance framework operated by security company that monitors a number of sites through closed circuit and IP cameras, anti-theft sensors (e.g., volume and pressure sensors, door opening sensors, etc.) and also physical sensors (e.g., humidity and temperature). Figure 1 shows a fragment of the process commonly followed to handle events and alarms coming from a surveillance network. When an alarm is received, frst the operators check the surveillance videos. If such videos are not available or they do not clearly show the event, the operator requests an on-site check to the security staf. Such action and its outcome, as well as the outcome to all the actions taken during the process, is stored in the system database. Then, if the event is in progress, the operator starts the true alarm handling process. Oth- erwise, if the notifed event is not actually in progress, the operator must check for other alarms on the same site and, if any, restart the handling process for such new events. If no other site alarms are active, the operator ECSA2021 Companion Volume, Robert Heinrich, Rafaela Mirandola and Danny Weyns, Växjö Sweden, 13-17 September 2021  giovanni.degasperis@univaq.it (G. De Gasperis); giuseppe.dellapenna@univaq.it (G. Della Penna); santedino.facchini@student.univaq.it (S. Facchini)  https://www.disim.univaq.it/main/home.php?users_username= giovanni.degasperis (G. De Gasperis); https://people.disim.univaq.it/~dellapenna (G. Della Penna)  0000-0001-9521-4711 (G. De Gasperis); 0000-0003-2327-9393 (G. Della Penna) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) must try to understand the reason of the notifed alarm. If it is recognized as a false alarm, the case is simply closed. On the other hand, if the alarm is improper, i.e., it is due a system anomaly, the operator starts an anomaly handling process. The software adopted by the company to support such a process was a monolithic application that ofered only basic functionalities such as collecting signals and data streams, presenting the events in a managing console and saving them in a persistent database. Therefore, most of the operations described by the event management process above required a substantial amount of manual work by the control center operators. While the human intervention cannot be avoided in such a context, as in any security-related context, machine learning can be exploited to assist the operators in several steps of the process, leaving the humans with only the most critical steps to accomplish (see, e.g., [1, 2, 3, 4] for examples belonging to diferent surveillance contexts). However, embedding machine learning in the com- pany pre-existing software presented several challenges. First, we are modifying a production, real-time critical system, so we need to gradually add such a support, in order to let the operators adapt to the new functionali- ties while verifying their reliability without interrupting the company services. Second, the closed, monolithic architecture of the company software described above makes any modifcation to the pre-existing process very complex and error-prone. It is also worth noting that such a software, developed many years ago, was already not adequate to accomplish the current high QoS levels and to be compliant with the latest safety regulations. Therefore, we decided to rebuild the system from scratch, extracting only some relevant modules/algo- rithms from the old software in order to embed it in