Citation: Papathanasiou, D.;
Demertzis, K.; Tziritas, N. Machine
Failure Prediction Using Survival
Analysis. Future Internet 2023, 15, 153.
https://doi.org/10.3390/fi15050153
Academic Editors: Remus BRAD and
Arpad Gellert
Received: 23 March 2023
Revised: 16 April 2023
Accepted: 19 April 2023
Published: 22 April 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
future internet
Article
Machine Failure Prediction Using Survival Analysis
Dimitris Papathanasiou
1,
* , Konstantinos Demertzis
2,
* and Nikos Tziritas
3
1
Department of Computer Science and Biomedical Informatics, University of Thessaly, Papasiopoulou 2–4,
35131 Lamia, Greece
2
School of Science and Technology, Informatics Studies, HellenicOpen University, Aristotle 18,
26335 Patra, Greece
3
Department of Computer Science and Telecommunications, University of Thessaly, 35100 Lamia, Greece
* Correspondence: padimitrios@uth.gr (D.P.); kdemertz@fmenr.duth.gr (K.D.)
Abstract: With the rapid growth of cloud computing and the creation of large-scale systems such
as IoT environments, the failure of machines/devices and, by extension, the systems that rely on
them is a major risk to their performance, usability, and the security systems that support them. The
need to predict such anomalies in combination with the creation of fault-tolerant systems to manage
them is a key factor for the development of safer and more stable systems. In this work, a model
consisting of survival analysis, feature analysis/selection, and machine learning was created, in
order to predict machine failure. The approach is based on the random survival forest model and an
architecture that aims to filter the features that are of major importance to the cause of machine failure.
The objectives of this paper are to (1) Create an efficient feature filtering mechanism, by combining
different methods of feature importance ranking, that can remove the “noise” from the data and leave
only the relevant information. The filtering mechanism uses the RadViz, COX, Rank2D, random
survival forest feature ranking, and recursive feature elimination, with each of the methods used to
achieve a different understanding of the data. (2) Predict the machine failure with a high degree of
accuracy using the RSF model, which is trained with optimal features. The proposed method yields
superior performance compared to other similar models, with an impressive C-index accuracy rate of
approximately 97%. The consistency of the model’s predictions makes it viable in large-scale systems,
where it can be used to improve the performance and security of these systems while also lowering
their overall cost and longevity.
Keywords: machine failure; survival analysis; random survival forest; feature analysis; feature
selection
1. Introduction
Critical infrastructure systems, such as water supply, power supply, transportation,
telecommunications, etc., play a significant role in the sustainable development of modern
societies. Modern infrastructure systems are highly interconnected and consist of geograph-
ically extensive networks. Continuous communication and data exchange between these
systems leads to interdependencies that are essential for their proper functioning and the
functioning of the overall system they belong to. Due to the large-scale networking of in-
frastructure systems, there can be economic, social, health, and environmental problems in
case of their failure. The failure of these systems can arise from extreme natural phenomena
(hurricanes, floods) or technological disasters and cyber-attacks. As a result, systems of
this type must be regularly monitored, upgraded, and maintained [1].
Ensuring the healthy and continuous operation of systems such as aircraft engines,
cars, computer servers, and even satellites, is an imperative need, given their contribution to
critical services, beyond urban infrastructure. The accurate prediction of their malfunctions
and, by extension, their operational interruptions, can contribute to improvements in the
design of proactive fault-tolerant systems, as well as significant cost reduction through
Future Internet 2023, 15, 153. https://doi.org/10.3390/fi15050153 https://www.mdpi.com/journal/futureinternet