Received December 8, 2020, accepted December 18, 2020, date of publication December 24, 2020, date of current version January 4, 2021. Digital Object Identifier 10.1109/ACCESS.2020.3047125 Development of Hypergraph Based Improved Random Forest Algorithm for Partial Discharge Pattern Classification SUGANYA GOVINDARAJAN 1 , JORGE ALFREDO ARDILA-REY 2 , (Member, IEEE), KANNAN KRITHIVASAN 1 , JAYALALITHA SUBBAIAH 1 , NIKHITH SANNIDHI 3 , AND M. BALASUBRAMANIAN 1 1 Electrical and Electronics Engineering Department, SASTRA Deemed University, Thanjavur 613401, India 2 Departamento de Ingeniería Eléctrica, Universidad Técnica Federico Santa María, Santiago de Chile 8940000, Chile 3 Zoho Corporation, Chennai 600042, India Corresponding author: Jorge Alfredo Ardila-Rey (jorge.ardila@usm.cl) This work was supported in part by the Agencia Nacional de Investigación y Desarrollo through the Fondecyt Regular Project under Grant 1200055 and the Fondef Project under Grant ID19I10165, in part by the UTFSM through DST-FIST, PI_m_19_01 Project under Grant SR/FST/ETI-338/2013(C) (dated September 10, 2014) and Grant FST/MSI-107/2015(C), and in part by Tata Realty-IT City-SASTRA Srinivasa Ramanujam Research Cell of SASTRA Deemed University. ABSTRACT Precise partial discharge (PD) detection is a key factor in anticipating insulation failures. The continuous efforts of researchers have led to the design of a variety of algorithms focusing on PD pattern classification. However, the trade-off between features taken up for classification and the detection rate continues to pose considerable challenges in terms of feature selection from acquired data, increased computing time, and so on. In this article, a Hypergraph (HG) based improved Random Forest (RF) algorithm by employing the Recursive Feature Elimination (RFE) algorithm (HG-RF-RFE), has been developed for PD source classification. HG representation of data is considered for obtaining statistical features, which turn out to be a subset of a set of all hyper edges called Hyper statistical features (Helly, Non-Helly, and Isolated hyper edges). HG-RF-RFE takes hyper statistical features and hyper edges as features for classification. The algorithm’s efficiency is tested against noise-free PD data obtained from SASTRA High Voltage Laboratory, and large-sized noisy PD data obtained from High-Voltage Research and Test Laboratory at Universidad Técnica Federico Santa Maria (LIDAT). The robustness of the proposed algorithm is tested with both time and phase domain PD features using the Mathews Correlation Coefficient (MCC), harmonic mean-based feature Score (F1 Score) as evaluation metrics, and by k-fold validation technique. The proposed HG-RF- RFE achieved 98.8% accuracy with minimal features and significantly reduces computation time without compromising accuracy. It is worth mentioning that the HG-RF-RFE technique is superior to many state of the art algorithms in terms of feature elimination and classification accuracy. INDEX TERMS Hypergraph, partial discharge, pattern classification, random forest, recursive feature elimination, statistical features. I. INTRODUCTION Partial Discharge (PD) measurement has been identified as a reliable insulation assessment diagnostic tool for high voltage equipment. In the dielectric material (solid, liquid, or gaseous), cavities, voids, cracks, and gaps are significant defects that lead to physical as well as chemical deteriora- tion in insulated interfaces when subjected to high voltage The associate editor coordinating the review of this manuscript and approving it for publication was Zhaojie Ju . stress. Whatever type of electrical equipment affected by PD can suffer from a series of severe insulation failures in the long term. The classification of PD patterns is an essential criterion for assessing and diagnosing the performance of the insulation systems, as it provides a significant index of dis- charge severity. The classification process aims to identify the defect that causes the discharge (surface discharge, corona, etc.) internally or externally. Since each defect has its typical degradation mechanism, in order to assess the quality of the insulation it is imperative to use this uniqueness to correlate 96 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 9, 2021