978-1-7281-1867-3/19/$31.00 ©2019 IEEE
Capsule Network for Predicting Zinc Binding Sites
in Metalloproteins
Clement Essien
Department of Electrical Engineering
and Computer Science
Christopher S. Bond Life Sciences
Center
University of Missouri, Columbia, MO
65211, USA
u.c.essien@mail.missouri.edu
Duolin Wang
Department of Electrical Engineering
and Computer Science
Christopher S. Bond Life Sciences
Center
University of Missouri, Columbia, MO
65211, USA
wangdu@missouri.edu
Dong Xu
Department of Electrical Engineering
and Computer Science
Christopher S. Bond Life Sciences
Center
University of Missouri, Columbia, MO
65211, USA
xudong@missouri.edu
Abstract
Zinc is an important cofactor for various biological functions
in plants and animals, which are usually associated with
proteins. Zinc also plays an important role in protein structures
to which it binds. Hence, it is important to predict the Zinc
binding sites in these proteins to better understand the
structures and functions of these proteins. Most of the existing
tools developed in this domain are structure-based predictors
implementing Support Vector Machines on datasets that are
more than a decade old. As there is little work done to explore
the use of deep learning frameworks in this problem, we propose
ZinCaps, a framework based on the capsule network for
predicting zinc binding site using sequence-only information on
more recently compiled datasets. ZinCaps outperforms
previous tools. Its source codes is freely available for download
at https://github.com/clemEssien/ActiveSitePrediction.
Keywords—zinc metal binding site; metalloproteins; deep
learning; capsule network
I. INTRODUCTION
Many proteins interact with metals to perform certain
biological functions. Metal-binding proteins known as
metalloproteins require specific metal cation(s) to function
properly. Such metal ions play major roles in a wide range of
cellular processes and are also useful in the development of
metal-based drugs such as anticancer drugs [1]. The two
categories of these metals are the alkali/alkaline earth metals
and transition metals. The former plays a structural role while
the later plays both structure stabilization and catalysis roles
[2].
Zinc is a transition metal and second to iron, it is the next
leading abundant trace metal found in the human body. A
70kg adult human has about 2.3g of zinc [3]. It is required for
more than 300 enzyme activities spanning all the six classes
of enzymes. Zinc (Zn
2+
) cofactor is essential for several
biological functions in plants and animals. When observed in
tissues, zinc is mainly associated with proteins [3]. As much
as 40% of proteins in humans that bind to zinc are
transcription factors while the rest which are usually
enzymes/proteins are involved in ion transport [4]. Zinc has
several chemical properties that lead to a variety of functions.
It does not undergo redox reactions because its d-shell is filled
unlike those of other first-row transition metals. As a result, it
offers stability in biological environments that are
characterized by fluctuating redox potentials [5].
Covalent zinc binding site is one of the most important
post-translational modifications in proteins. It comprises the
sulfur of cysteine, the nitrogen of histidine and/or the oxygen
of aspartate and glutamate [3]. Histidine is the most observed
followed by cysteine [6]. There is a correlation between the
number of amino acids to which the zinc atom binds to and
the activity of the metalloprotein [7].
There are three primary types of zinc sites which are;
structural, catalytic and cocatalytic. The structural zinc binds
to four amino acids with no bound water molecule. Cysteine
(Cys) is preferred in this site. Structural zinc essentially
maintains the stability of the protein tertiary structures without
taking part in the biochemical reaction [8] [9]. Catalytic zinc
refers to zinc that binds to three amino acid residues to form
complexes with water and any three nitrogen, oxygen and
sulphur donors. They are actively involved in biochemical
reactions. Histidine (His) is preferred for these sites.
Cocatalytic zinc sites interact with other metal ions (usually
two or three) usually linked with the side-chain atom or water
molecule to carry out their function [10]. Aspartate (Asp) and
glutamate (Glu) are the preferred amino acids in these sites.
There is also a fourth type of zinc binding site that arises from
the influence of zinc on the quaternary protein structure,
where zinc ions bind to one or two amino acid residues (Asp,
Glu or His but no Cys) on the protein surface during
crystallization. They have neither biological nor catalytic
function [9].
Due to the rapid expansion of protein databases, it is
becoming important to identify zinc binding sites to
understand their functions in metalloproteins. This would be
useful for the prediction of protein structure and function.
While determining zinc binding sites using experimental
techniques is laborious and costly, some attempts have been
made to develop computational tools by training machine
learning models for this purpose.
Attempts have been made to predict zinc binding sites in
metalloproteins from protein sequences. Ref. [11] generated
zinc binding Cys, His, Glu and Asp predictors by training a
support vector machine (SVM) classifier with position-
specific scoring matrix (PSSM) obtained from PSI-BLAST.
Ref. [12] presented a two-stage approach that uses SVM and
recurrent neural network (RNN) in the first and second stages
respectively. This predicts Cys and His being in either free or
metal bound states and or disulfide bridges. While the first
stage predicts the binding states, the second makes
refinement by considering dependencies between the protein
residues. They achieved a 73% precision and 61% recall
while predicting zinc binding sites in proteins. Ref. [13]
proposed ZincPred which combined SVM with homology-