MULTI-PCA DRIVEN APPROACH for FAULT DETECTION and ROOT CAUSE ANALYSIS of PROCESS EQUIPMENT Jinendra K Gugaliya ABB Ability Innovation Centre Bangalore, India jinendra.gugaliya@in.abb.com Rahul Kumar Vij ABB Ability Innovation Centre Bangalore, India Rahul.Kumar-vij@in.abb.com Srini Ramaswamy ABB, Power Generation Cleveland, USA srini.ramaswamy@us.abb.com Lakshwin Shreesha M K BMS College of Engineering Bangalore, India shreeshalakshwin@gmail.com Abstract Principal Component Analysis (PCA) is quite popular for fault detection and diagnosis in industrial applications. PCA assumes linear relationships among the features and serves to represent them as a linear combination. However, a typical industrial application can have non-linearity due to operation at multiple operating regions or inherent non-linear relation- ships among the features. This paper proposes a novel cluster- ing based Multi-PCA approach which can divide the overall non-linearity into simpler linearity’s which can subsequently be modelled by multiple PCA models. The clustering is done with the use of domain knowledge where the fact that an op- eration of an asset at different operating points can lead to multimodal distribution of the variables. The proposed ap- proach is structured systematically with the following steps 1) Feature set selection 2) Hierarchical Density Based Spa- tial Clustering (HDBSCAN) and 3) Fitting a PCA model in each cluster. The proposed approach retains the computa- tional simplicity of the PCA compared to models based on other non-linear modelling approaches such as neural net- work based autoencoders. Finally the paper also proposes a simplified Root Cause Analysis (RCA) algorithm for identi- fying the cause of the fault. 1 Introduction Industrial assets such as motors, pumps, fans, turbines etc. are subject to faults and failures due to operation at excess load conditions or due to aging effects. Identifying that an industrial asset is drifting towards an abnormal condition is the key to avoid unplanned downtime of an industry due to asset failure. In literature, there are two important ap- proaches to tackle this challenge of detecting abnormal as- set health. The first approach is based on detailed know-how and physics of the asset and second approach is black box. The first approach works well for simpler assets such as a motor, as the underlying physics is well established. How- ever, this approach is not easily scalable and it requires one Copyright c 2020 held by the author(s). In A. Martin, K. Hinkel- mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen (Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com- bining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE 2020). Stanford University, Palo Alto, California, USA, March 23-25, 2020. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). to develop physics based models for every asset. Addition- ally, as industrial assets become complex, such an approach is difficult to implement. Hence, there is a significant shift to apply data driven approaches for asset health monitoring. Due to availability of low cost sensors and digital technol- ogy, lots of data can be collected from an industrial asset and approaches based on machine learning principles can be ap- plied to learn the model for the asset, in a semi-automated manner. Such an approach is easily scalable and can be ap- plied to a variety of machines. Some of the earliest fault detection techniques were model based. One such popular algorithm was based around analytic redundancy, wherein a comparison between the in- puts of the monitored system and the output obtained from an analytical mathematical model was carried out to detect the presence of a fault (M. Frank 1990). However this com- parison was a naive estimate and failed to capture faulty con- ditions in high dimensional spaces. Following which several approaches based on multivariate-statistical process control methods (MacGregor and Kourti 1995); (Kresta and Mar- lin 1991); (Macgregor 1994) were presented for the diagno- sis of complex physical processes. The usage of state ob- servers by modelling faults as state variable changes (Iser- mann 2005) provided a better strategy for aberration detec- tion compared to statistical processes albeit at higher com- putational cost. A pressing need to capture and localize abnormalities at reduced computational rates brought about the usage of Principal Component Analysis (PCA). PCA defines a new outlook to the data and aims to capture hidden structure underneath data redundancy and noise (Pearson 1901). An abundant list of algorithms based around the PCA is evi- dent in literature. One such approach involves using the Q and T 2 statistic (Villegas, Fuente, and Rodr´ ıguez 2010) for fault detection. This methodology was subsequently simu- lated for fault detection in a waste water treatment plant (Garcia-Alvarez 2009) wherein the authors showcase results which capture local linear structure only. An improvement to the conventional PCA model was brought about by introduc- ing the dynamic PCA (DPCA) (Russell, Chiang, and Braatz 2000) which is established by considering the dependency of current observations on previous time instances as well. A non-linear modification to the PCA involved the combina- tion of using the Kronecker product, wavelet decomposition