Predicting Calcium-Binding Sites in Proteins—A Graph
Theory and Geometry Approach
Hai Deng,
1
Guantao Chen,
1,2
Wei Yang,
3
and Jenny J. Yang
3
*
1
Department of Computer Science, Georgia State University, Atlanta, Georgia
2
Department of Mathematics and Statistics, Georgia State University, Atlanta, Georgia
3
Department of Chemistry, Georgia State University, Atlanta, Georgia
ABSTRACT Identifying calcium-binding sites
in proteins is one of the first steps towards predict-
ing and understanding the role of calcium in biologi-
cal systems for protein structure and function stud-
ies. Due to the complexity and irregularity of
calcium-binding sites, a fast and accurate method
for predicting and identifying calcium-binding pro-
tein is needed. Here we report our development of a
new fast algorithm (GG) to detect calcium-binding
sites. The GG algorithm uses a graph theory algo-
rithm to find oxygen clusters of the protein and a
geometric algorithm to identify the center of these
clusters. A cluster of four or more oxygen atoms has
a high potential for calcium binding. High perfor-
mance with about 90% site sensitivity and 80% site
selectivity has been obtained for three datasets
containing a total of 123 proteins. The results sug-
gest that a sphere of a certain size with four or more
oxygen atoms on the surface and without other
atoms inside is necessary and sufficient for quickly
identifying the majority of the calcium-binding sites
with high accuracy. Our finding opens a new avenue to
visualize and analyze calcium-binding sites in proteins
facilitating the prediction of functions from structural
genomic information. Proteins 2006;64:34 – 42.
© 2006 Wiley-Liss, Inc.
Key words: calcium-binding proteins; metal-bind-
ing geometry; function prediction;
graph theory; oxygen cluster
INTRODUCTION
Calcium regulates many biological processes through its
interactions with numerous calcium-binding proteins.
1,2
In addition to stabilizing the proteins, calcium also in-
duces conformational changes to switch on and off the
biological functions.
3,4
Calcium ions are predominantly
chelated by protein oxygen atoms from carboxyl side chain
of Asp, Glu, Asn, and Gln, hydroxyl group from Ser and
Thr, and carbonyl main of proteins. In addition, it can be
chelated by oxygen atoms from solvent water, phosphate,
carbohydrate, and lipids.
5,6
Although the coordination
number of calcium varies from 3 to more than 10 in small
molecules, it is typically 5– 8 in proteins with an average of
about 6.5–7.
7,8
Our studies and others showed that most of
the calcium– oxygen distances in proteins vary from 2 to 3
Å, with an average about 2.4 Å and different classes of
calcium-binding sites in proteins can be identified using a
pentagonal bipyramidal geometry with Ca–O bond lengths
of 2.4 1.0 Å and common calcium binding ligand
residues.
7
This finding has facilitated us to apply geometry-
based algorithm in addition to the charge and chemical
properties to design calcium-binding proteins with biologi-
cal functions.
8 –10
There is a strong need for developing methodology to
predict and visualize calcium-binding sites in proteins
with high speed. Due to the rapid progress in NMR
techniques, the solution structures of proteins are signifi-
cantly increased.
11
Unfortunately, NMR cannot directly
provide the calcium-binding coordination. Although X-ray
crystallography has been the major tool to visualize cal-
cium-binding sites, calcium-binding sites with weak affini-
ties are less defined, if not completely unknown. For
example, although it is known that calcium is essential for
the function of metabotropic glutamate receptors (mGluR),
no calcium-binding sites were observed in several X-ray
structures of the extracellular domains of the mGluR.
12–14
In addition, as the worldwide development in structural
genomics speeds up to solve numerous protein structures,
the prediction of protein’s functions and metal-binding
properties becomes more and more important.
15–20
Identi-
fying calcium-binding sites is not only crucial for the study
of individual proteins but also helpful for revealing the
general factors involved in such as the mechanisms govern-
ing calcium-binding affinity, selectivity, and calcium-
induced conformational change. A fast and accurate meth-
odology for predicting calcium-binding sites will facilitate
the understanding and predicting calcium roles in biologi-
cal systems (denoted as calciomics).
21–23
The Supplementary Material referred to in this article can be found
at http://www.interscience.wiley.com/jpages/0887-3585/suppmat/
The first three authors contributed equally to this article.
Grant sponsor: Graduate Assistantship from the College of Arts &
Sciences at Georgia State University (to H.D.); Grant sponsor: NSF;
Grant number: MCB-0092486 (to J.J.Y.); Grant sponsor: NIH; Grant
number: GM 62999-1 (to J.J.Y.); Grant sponsor: NSF; Grant number:
DMS-0500951 (to. G.C.); Grant sponsor: NSA; Grant number: H98230-
04-1-0300 (to G.C.); Grant sponsor: NIH P20 Award; Grant number:
P20 GM065762-01A1 (to J.J.Y. and G.C.).
*Correspondence to: Jenny J. Yang, Department of Chemistry,
Georgia State University, University Plaza, Atlanta, GA 30302.
E-mail: chejjy@langate.gsu.edu
Received 16 October 2005; Revised 24 January 2006; Accepted 25
January 2006
Published online 14 April 2006 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/prot.20973
PROTEINS: Structure, Function, and Bioinformatics 64:34 – 42 (2006)
© 2006 WILEY-LISS, INC.