1545-5963 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCBB.2016.2621042, IEEE/ACM Transactions on Computational Biology and Bioinformatics
VYAS ET AL 1
Application of Genetic Programming (GP)
formalism for building disease predictive
models from protein-protein interactions (PPI)
data
Renu Vyas, Sanket Bapat, Purva Goel, M. Karthikeyan, S.S. Tambe and B.D. Kulkarni
Abstract—Protein-protein interactions (PPIs) play a vital role in the biological processes involved in the cell functions
and disease pathways. The experimental methods known to predict PPIs require tremendous efforts and the results are
often hindered by the presence of a large number of false positives. Herein, we demonstrate the use of a new Genetic
Programming (GP) based Symbolic Regression (SR) approach for predicting PPIs related to a disease. In a case study, a
dataset consisting of one hundred and thirty five PPI complexes related to cancer was used to construct a generic PPI
predicting model with good PPI prediction accuracy and generalization ability. A high correlation coefficient(CC) of 0.893, low
root mean square error (RMSE) and mean absolute percentage error (MAPE) values of 478.221and 0.239, respectively were
achieved for both the training and test set outputs. To validate the discriminatory nature of the model, it was applied on a dataset of
diabetes complexes where it yielded significantly low CC values.Thus, the GP model developed here serves a dual purpose: (a)a
predictor of the binding energy of cancer related PPI complexes,and (b)a classifier for discriminating PPIcomplexes related to can-
cer from those of other diseases.
Index Terms—Genetic Programming, Protein-protein interactions, Disease, Binding energy, Machine learning, Cancer, Symbolic
Regression
————————————————————
1 INTRODUCTION
P
rotein-protein interactions (PPI) regulate the func-
tionsand cellular activity within a cell [1]. The role of PPI
in a homeostasis state is to carry out various biological
and functional activities of the cell; any disturbance in the
cell disrupts the normal functioning of PPIs. The DNA
replication, transcription, translation, splicing, cell cycle
control and signal transduction, are some biological pro-
cesses wherein proteins interact with each other. In a dis-
eased state, the normal pathways are up-regulated or
down-regulated based on the association or disassocia-
tion between a pair of interacting proteins. Prediction of
a protein's potential of interactionwith the partner pro-
tein becomes essential in modeling the disease progres-
sion, prognosis and prediction, and understanding path-
ways [2].In the present work, we propose to develop a
predictive model for identifying the protein-protein com-
plexes related to a disease. The interaction betweentwo
interacting polypeptide chains was studied by employing
various parameterssuch as the binding energy,and other
3D structural information obtained from the X-ray solved
co-crystals as model inputs (predictor variable). A math-
ematical expression that correlates these predictor varia-
bles with the propensity of polypeptide chains to bind
with each other was developed using the evolutionary GP
based symbolic regression (SR) method.
1.1 Overview of machine learning studies on
protein-protein interactions
Experimentally, the information regarding protein-
xxxx-xxxx/0x/$xx.00 © 200x IEEE Published by the IEEE Computer Society
————————————————
R. Vyas is with MIT school of Bioenginnering Science and research, Loni,
Kalbhor, Pune.
Email: renu.vyas@mituniversity.edu.in
S.Bapat and M. Karthikeyanare with the DIRC, CSIR-National Chemical
Laboratory, Pune- 411008 India
E-mail: sanket.bapat@yahoo.in, m.karthikeyan@ncl.res.in.
P. Goel, S. Tambe and B.D. Kulkarni are with the Chemical Engineering
and Process Development Division ,CSIR-National Chemical Laboratory,
Pune – 411008 India
E-mail: goelpurva@gmail.com, ss.tambe@ncl.res.in, bdkulkarni@ncl.res.in
Please note that all acknowledgments should be placed at the end of the paper,
before the bibliography (note that corresponding authorship is not noted in
affiliation box, but in acknowledgment section).