Biomedical Signal Processing and Control 65 (2021) 102359
1746-8094/© 2020 Elsevier Ltd. All rights reserved.
A novel entropy-based mapping method for determining the
protein-protein interactions in viral genomes by using coevolution analysis
Talha Burak Alakus
a,
*, Ibrahim Turkoglu
b
a
Kirklareli University, Engineering Faculty, Department of Software Engineering, Kirklareli, Turkey
b
Firat University, Technology Faculty, Department of Software Engineering, Elazig, Turkey
A R T I C L E INFO
Keywords:
Protein coevolution
Numerical mapping technique
Protein-protein interactions
Entropy
ABSTRACT
Protein-protein interactions have a vital role in DNA transcription, immune system, and signal transmission
between cells. Determining the interactions between proteins can give information about the functional structure
of a cell and the functions of target organisms. Protein-protein interactions are determined by experimental
approaches, yet, there is still a huge gap in specifying all possible protein interactions in an organism.
Furthermore, since these approaches use cloning, labeling, and affnity mass spectrometry, the analysis process is
time-consuming and expensive. However, analyzing the protein interactions with computational approaches
based on coevolution theory eliminate these kinds of limitations, since in the coevolution theory model, inter-
acting proteins show coevolutionary mutations and form similar phylogenetic trees. Current coevolution
methods are based on the multiple-sequence alignment process; yet many high false positive interactions arise
with these methods. Therefore, it is important to perform computational-based coevolution analysis. Protein-
protein interaction using coevolution analysis has been employed in conjunction with experimental ap-
proaches to explore new protein interactions. However, in order to predict protein interactions with
computational-based coevolution analysis, protein sequences need to be mapped. There are various types of
protein mapping methods belonging to certain categories in the literature. These methods are frequently used in
studies of predicting protein interactions. In this study, as an alternative to these methods, we proposed a novel
entropy-based protein mapping method and predicted protein-protein interactions in viral genomes by using
coevolution analysis. The study consists of 5 stages. In the frst stage, the protein sequences of viral genomes were
mapped using both the proposed numerical mapping method and state-of-arts protein mapping methods. In the
second stage, Fourier transform was applied to each mapped protein sequences. In the third stage, the distance
matrix was generated by fnding the distances between the proteins belonging to the same virus genome. In the
fourth stage, Pearson correlation values between the distances were calculated and coevolution analysis was
performed. In the last stage, the proposed mapping method was compared with state-of-arts protein mapping
methods and MirrorTree approach. Coevolution analysis was performed on two different virus genomes; Ebola
virus and Infuenza A virus. With the proposed method, a high degree of correlation has been obtained between
proteins of the Ebola virus. For Ebola virus, the lowest correlation result (0.75) was obtained between the NP-
VP35 protein pair. The highest correlation (0.99) was observed between the NP-VP24 and NP-VP40 protein
pairs. For Infuenza A, the lowest correlation (0.09) was obtained between the M1-PA(X) protein pair with the
proposed method. The highest correlation value (0.98) with the proposed method was calculated between the
M1-M2 protein pair. The proposed method verifed the interactions between protein pairs, which have been
experimentally proven, with a high degree correlation value. These results indicated that the proposed method
can be effective in predicting protein interactions.
1. Introduction
Most of the processes of cellular in an organism are controlled and
fulflled by proteins. They are indispensable since they perform all of
these processes by interacting with other proteins, with other nucleic
acids, and with other cell components. Interactions fall into two
* Corresponding author.
E-mail addresses: talhaburakalakus@klu.edu.tr (T.B. Alakus), iturkoglu@frat.edu.tr (I. Turkoglu).
Contents lists available at ScienceDirect
Biomedical Signal Processing and Control
journal homepage: www.elsevier.com/locate/bspc
https://doi.org/10.1016/j.bspc.2020.102359
Received 20 July 2020; Received in revised form 16 October 2020; Accepted 16 November 2020