IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2022 1 A Machine-Learning Architecture for Sensor Fault Detection, Isolation and Accommodation in Digital Twins Hossein Darvishi, Student Member, IEEE , Domenico Ciuonzo, Senior Member, IEEE , and Pierluigi Salvo Rossi, Senior Member, IEEE Abstract Sensor technologies empower Industry 4.0 by enabling integration of in-field and real-time raw data into digital twins. However, sensors might be unreliable due to inherent issues and/or environmental conditions. This paper aims at detecting anomalies instantaneously in measurements from sensors, identifying the faulty ones and accommodating them with appropriate estimated data, thus paving the way to reliable digital twins. More specifically, a real-time general machine-learning-based architecture for sensor validation is proposed, built upon a series of neural-network estima- tors and a classifier. Estimators correspond to virtual sensors of all unreliable sensors (to reconstruct normal behaviour and replace the isolated faulty sensor within the system), whereas the classifier is used for detection and isolation tasks. A comprehensive statistical analysis on three different real-world data-sets is conducted and the performance of the proposed architecture is validated under hard and soft synthetically-generated faults. Index TermsDigital twin, Fault diagnosis, Machine learning, Neural networks, Sensor validation. I. I NTRODUCTION D IGITAL TWINS (DTs) have recently emerged in several industrial applications and exploit Internet of Things (IoT) technology [1]. More specifically, most environments have been pervaded by the extensive use of spatially- distributed sensors, generating enormous amount of hetero- geneous data over time, which requires advanced integrated solutions involving sensing, communication, and processing [2]–[4]. DTs represent one of the main products for building advanced analytics over such data and extract relevant infor- mation for prediction and effective control. DTs have been widely employed in various sectors such as industry [5], health care [6] and smart cities [7], [8], where their capabilities to visualize and treat with a perpetual stream of real-time sensor This work was partially supported by the Research Council of Norway under the project SIGNIFY within the IKTPLUSS framework (project nr. 311902). Part of this work was presented at the IEEE International Conference on Networking, Sensing and Control (ICNSC) 2021. H. Darvishi is with the Department of Electronic Systems, Norwegian University of Science and Technology, 7491 Trondheim, Norway, and with the Signal Processing Laboratory (LTS4), ´ Ecole polytechnique ed´ erale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland (e-mail: hossein.darvishi@ntnu.no). D. Ciuonzo is with the Department of Electrical Engineering and Information Technologies (DIETI), University of Naples “Federico II”, 80125 Naples, Italy (e-mail: domenico.ciuonzo@unina.it). P. Salvo Rossi is with the Department of Electronic Systems, Norwe- gian University of Science and Technology, 7491 Trondheim, Norway, and with the Department of Gas Technology, SINTEF Energy Research, 7491 Trondheim, Norway (e-mail: salvorossi@ieee.org). data is enabling new opportunities. Leveraging sensor data enables DTs to model system dynamics effectively for remote monitoring and controlling, for safety and risk analysis and for maintenance purposes. Since DTs rely on accurate sensor data, system performance may be affected severely by sensor failures. Sources of sensor faults are commonly found in: (i) Hardware and inherited limitations - sensors are electronic components and can collect inaccurate measurements or stop working without any indication due to low production quality, calibration issues, low battery level, end of life span, poor connections [9]; (ii) Harsh environment - in real-world sce- narios, sensors can be deployed in inaccessible and unattended environments with possibility of unlikely situations which would hinder sensors performance [10]; (iii) Malicious attacks - faulty data can be injected by an attacker into a vulnerable system [11], [12]. A fault in a system refers to a complete (or partial) malfunc- tion and manifests over a permanent (or transient) time span. As shown in Fig. 1, the most common types of sensor faults in a sensor network are defined (a detailed discussion of sensor faults is found in [13], [14]). Depending on the characteristics of sensor data, faults can be classified as following: 1) Bias fault: also known as offset fault, the deviation from nominal values is given by an additive constant bias; 2) Drift fault: sensor readings drift with a small slope from nominal values (drift faults are more subtle since they