Recurrent Neural Networks for Online Remaining Useful Life Estimation in Ion Mill Etching System Vishnu TV 1 , Priyanka Gupta 2 , Pankaj Malhotra 3 , Lovekesh Vig 4 , and Gautam Shroff 5 1,2,3,4,5 TCS Research, Noida, Uttar Pradesh, 201309, India vishnu.tv@tcs.com priyanka.g35@tcs.com malhotra.pankaj@tcs.com lovekesh.vig@tcs.com gautam.shroff@tcs.com ABSTRACT We describe the approach – submitted as part of the 2018 PHM Data Challenge – for estimating time-to-failure or Re- maining Useful Life (RUL) of Ion Mill Etching Systems in an online fashion using data from multiple sensors. RUL es- timation from multi-sensor data can be considered as learn- ing a regression function that maps a multivariate time series to a real-valued number, i.e. the RUL. We use a deep Re- current Neural Network (RNN) to learn the metric regression function from multivariate time series. We highlight practical aspects of the RUL estimation problem in this data challenge such as i) multiple operating conditions, ii) lack of knowledge of exact onset of failure or degradation, iii) different opera- tional behavior across tools in terms of range of values of pa- rameters, etc. We describe our solution in the context of these challenges. Importantly, multiple modes of failure are possi- ble in an ion mill etching system; therefore, it is desirable to estimate the RUL with respect to each of the failure modes. The data challenge considers three such modes of failures and requires estimating RULs with respect to each one, implying learning three metric regression functions - one correspond- ing to each failure mode. We propose a simple yet effective extension to existing methods of RUL estimation using RNN based regression to learn a single deep RNN model that can simultaneously estimate RULs corresponding to all three fail- ure modes. Our best model is an ensemble of two such RNN models and achieves a score of 1.91 × 10 7 on the final vali- dation set. 1. I NTRODUCTION With the advent of Industrial Internet of Things (IIOT) (Xu et al., 2014), large amounts of temporal sensor data is avail- able in (near) real-time leading to an increasing interest in remote monitoring of equipment. Typically, a large number of sensors are installed across various components and sub- components of a complex system. This leads manual moni- Vishnu TV et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. toring of the system extremely challenging. Data-driven ap- proaches can aid operators to monitor the sensor data and generate suitable alerts along with potential diagnostics in case of malfunctioning system. Building data-driven or ma- chine learning based models for fault detection and prognos- tics (remaining useful life estimation) from sensor data can help in real-time monitoring of equipment, avoid catastrophic failures, enable condition-based maintenances, as well as help to take key engineering decisions, e.g. to improve fu- ture manufacturing processes. Recently, deep Recurrent Neural Networks (RNNs) based on gated units such as Long Short Term Memory (LSTM) net- works (Hochreiter & Schmidhuber, 1997) have been success- fully used for modeling sequential data. It has been shown that RNNs can model the temporal (sequential) aspect of the sensor data as well as capture the inter-sensor dependencies Malhotra et al. (2015). RNNs have been used to model be- havior of machines based on multi-sensor time series with applications to anomaly and fault detection (Malhotra et al., 2015; Malhotra, Ramakrishnan, et al., 2016; Yadav et al., 2016; Filonov et al., 2016), Remaining Useful Life (RUL) es- timation (Malhotra, TV, et al., 2016; Gugulothu et al., 2017; TV et al., 2018), and diagnostics (TV et al., 2017; Gugulothu et al., 2018). Several approaches for RUL estimation using RNNs have been proposed in the past for various type of equipment, e.g. turbofan engines (Heimes, 2008; Malhotra, TV, et al., 2016; Gugulothu et al., 2017), milling machines (Malhotra, TV, et al., 2016), etc. These approaches can be categorised into two types: supervised and semi-supervised. Supervised ap- proaches model RUL estimation as a metric regression prob- lem where RUL is considered to be a real-valued number and a metric regression function – modeled via a (deep) RNN – is learned to map the time series of sensor data to RUL. Examples of this approach include (Heimes, 2008; Zheng et al., 2017; TV et al., 2018). Semi-supervised approaches first learn a deep RNN based model of normal behavior, which is then used to obtain a health index trend of any instance of a machine. The health index trend of a test instance is com- 1