Citation: Papathanasiou, D.; Demertzis, K.; Tziritas, N. Machine Failure Prediction Using Survival Analysis. Future Internet 2023, 15, 153. https://doi.org/10.3390/fi15050153 Academic Editors: Remus BRAD and Arpad Gellert Received: 23 March 2023 Revised: 16 April 2023 Accepted: 19 April 2023 Published: 22 April 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). future internet Article Machine Failure Prediction Using Survival Analysis Dimitris Papathanasiou 1, * , Konstantinos Demertzis 2, * and Nikos Tziritas 3 1 Department of Computer Science and Biomedical Informatics, University of Thessaly, Papasiopoulou 2–4, 35131 Lamia, Greece 2 School of Science and Technology, Informatics Studies, HellenicOpen University, Aristotle 18, 26335 Patra, Greece 3 Department of Computer Science and Telecommunications, University of Thessaly, 35100 Lamia, Greece * Correspondence: padimitrios@uth.gr (D.P.); kdemertz@fmenr.duth.gr (K.D.) Abstract: With the rapid growth of cloud computing and the creation of large-scale systems such as IoT environments, the failure of machines/devices and, by extension, the systems that rely on them is a major risk to their performance, usability, and the security systems that support them. The need to predict such anomalies in combination with the creation of fault-tolerant systems to manage them is a key factor for the development of safer and more stable systems. In this work, a model consisting of survival analysis, feature analysis/selection, and machine learning was created, in order to predict machine failure. The approach is based on the random survival forest model and an architecture that aims to filter the features that are of major importance to the cause of machine failure. The objectives of this paper are to (1) Create an efficient feature filtering mechanism, by combining different methods of feature importance ranking, that can remove the “noise” from the data and leave only the relevant information. The filtering mechanism uses the RadViz, COX, Rank2D, random survival forest feature ranking, and recursive feature elimination, with each of the methods used to achieve a different understanding of the data. (2) Predict the machine failure with a high degree of accuracy using the RSF model, which is trained with optimal features. The proposed method yields superior performance compared to other similar models, with an impressive C-index accuracy rate of approximately 97%. The consistency of the model’s predictions makes it viable in large-scale systems, where it can be used to improve the performance and security of these systems while also lowering their overall cost and longevity. Keywords: machine failure; survival analysis; random survival forest; feature analysis; feature selection 1. Introduction Critical infrastructure systems, such as water supply, power supply, transportation, telecommunications, etc., play a significant role in the sustainable development of modern societies. Modern infrastructure systems are highly interconnected and consist of geograph- ically extensive networks. Continuous communication and data exchange between these systems leads to interdependencies that are essential for their proper functioning and the functioning of the overall system they belong to. Due to the large-scale networking of in- frastructure systems, there can be economic, social, health, and environmental problems in case of their failure. The failure of these systems can arise from extreme natural phenomena (hurricanes, floods) or technological disasters and cyber-attacks. As a result, systems of this type must be regularly monitored, upgraded, and maintained [1]. Ensuring the healthy and continuous operation of systems such as aircraft engines, cars, computer servers, and even satellites, is an imperative need, given their contribution to critical services, beyond urban infrastructure. The accurate prediction of their malfunctions and, by extension, their operational interruptions, can contribute to improvements in the design of proactive fault-tolerant systems, as well as significant cost reduction through Future Internet 2023, 15, 153. https://doi.org/10.3390/fi15050153 https://www.mdpi.com/journal/futureinternet