Received 3 October 2022, accepted 17 October 2022, date of publication 21 October 2022, date of current version 27 October 2022. Digital Object Identifier 10.1109/ACCESS.2022.3216416 Centrality Combination Method Based on Feature Selection for Protein Interaction Networks HAOYUE WANG , LI PAN , JING SUN, BIN LI, JUNQIANG JIANG , (Member, IEEE), BO YANG , AND WENBIN LI Department of Information Science and Engineering, Hunan Institute of Science and Technology, Yueyang 414006, China Corresponding authors: Li Pan (lipan@hnist.edu.cn) and Wenbin Li (wenbin_lii@163.com) This work was supported in part by the Scientific Research Fund of Hunan Provincial Education Department of China under Grant 18A312 and Grant 19B231, in part by the Hunan Provincial Postgraduate Research and Innovation Foundation Project of China under Grant CX20211183, and in part by the Postgraduate Research and Innovation Foundation Project of Hunan Institute of Science and Technology of China. ABSTRACT Essential proteins are important participants in various life activities and play a vital role in the survival and reproduction of life. The network-based centrality methods are a common way to identify essential proteins for protein interaction networks. Due to the differences between the existing centrality methods, it is a feasible approach to improve the identification accuracy of essential proteins by combining centrality methods. In this paper, we propose a centrality combination method based on feature selection. First, the measure values of the 14 classical centrality methods are viewed as feature data. Then, a subset of the relevant features is selected according to the importance of features. Finally, the centrality methods corresponding to the selected features are combined by using the geometric mean method for the identification of essential proteins. To verify the effectiveness of the combination method, we apply the combination method on the original static protein interaction network (SPIN), the dynamic protein interaction network (DPIN) and the refined dynamic protein interaction network (RDPIN), and compare the result with those by each single centrality method (LAC, DC, DMNC, NC, TP, CLC, BC, LC, CC, KC, CR, EC, PR, LR). The experimental results on the identification of essential proteins shows that the combination method achieves better results in prediction performance than the 14 centrality mehtods in terms of the prediction precision, sensitivity, specificity, positive predictive value, negative predictive value, F-measure and accuracy rate. It has been illustrated that the proposed method can help to identify essential proteins more accurately. INDEX TERMS Centrality methods, combination method, essential proteins, feature selection, protein interaction networks. I. INTRODUCTION Proteins are the main undertaker of biological life activities, and the importance of different proteins to life activities is distinct. Based on this, proteins can be divided into essential proteins and non-essential proteins. Among them, essential proteins are closely related to the metabolism, differentiation and apoptosis of biological cells, which are indispensable for cell survival. Proteins participate in all aspects of life processes such as biological signal transmission, gene expres- sion regulation, energy and material metabolism through their The associate editor coordinating the review of this manuscript and approving it for publication was Ali Salehzadeh-Yazdi . interactions with each other, forming a protein interaction network. Previously, some researchers have identified essential pro- teins experimentally, such as single gene knockouts [1], RNA interference [2], and conditional knockouts [3], which are expensive, time-consuming, and not always feasible. With the explosion of high-throughput data, computational methods (centrality methods) can identify essential proteins quickly and at lower cost, in which network-based centrality meth- ods are an important class of methods for essential protein identification [4], [5]. Network-based methods are mainly divided into three categories [6]: neighborhood-based meth- ods, path-based methods, and eigenvector-based methods. 112028 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 10, 2022