Predicting Calcium-Binding Sites in Proteins—A Graph Theory and Geometry Approach Hai Deng, 1 Guantao Chen, 1,2 Wei Yang, 3 and Jenny J. Yang 3 * 1 Department of Computer Science, Georgia State University, Atlanta, Georgia 2 Department of Mathematics and Statistics, Georgia State University, Atlanta, Georgia 3 Department of Chemistry, Georgia State University, Atlanta, Georgia ABSTRACT Identifying calcium-binding sites in proteins is one of the first steps towards predict- ing and understanding the role of calcium in biologi- cal systems for protein structure and function stud- ies. Due to the complexity and irregularity of calcium-binding sites, a fast and accurate method for predicting and identifying calcium-binding pro- tein is needed. Here we report our development of a new fast algorithm (GG) to detect calcium-binding sites. The GG algorithm uses a graph theory algo- rithm to find oxygen clusters of the protein and a geometric algorithm to identify the center of these clusters. A cluster of four or more oxygen atoms has a high potential for calcium binding. High perfor- mance with about 90% site sensitivity and 80% site selectivity has been obtained for three datasets containing a total of 123 proteins. The results sug- gest that a sphere of a certain size with four or more oxygen atoms on the surface and without other atoms inside is necessary and sufficient for quickly identifying the majority of the calcium-binding sites with high accuracy. Our finding opens a new avenue to visualize and analyze calcium-binding sites in proteins facilitating the prediction of functions from structural genomic information. Proteins 2006;64:34 – 42. © 2006 Wiley-Liss, Inc. Key words: calcium-binding proteins; metal-bind- ing geometry; function prediction; graph theory; oxygen cluster INTRODUCTION Calcium regulates many biological processes through its interactions with numerous calcium-binding proteins. 1,2 In addition to stabilizing the proteins, calcium also in- duces conformational changes to switch on and off the biological functions. 3,4 Calcium ions are predominantly chelated by protein oxygen atoms from carboxyl side chain of Asp, Glu, Asn, and Gln, hydroxyl group from Ser and Thr, and carbonyl main of proteins. In addition, it can be chelated by oxygen atoms from solvent water, phosphate, carbohydrate, and lipids. 5,6 Although the coordination number of calcium varies from 3 to more than 10 in small molecules, it is typically 5– 8 in proteins with an average of about 6.5–7. 7,8 Our studies and others showed that most of the calcium– oxygen distances in proteins vary from 2 to 3 Å, with an average about 2.4 Å and different classes of calcium-binding sites in proteins can be identified using a pentagonal bipyramidal geometry with Ca–O bond lengths of 2.4 1.0 Å and common calcium binding ligand residues. 7 This finding has facilitated us to apply geometry- based algorithm in addition to the charge and chemical properties to design calcium-binding proteins with biologi- cal functions. 8 –10 There is a strong need for developing methodology to predict and visualize calcium-binding sites in proteins with high speed. Due to the rapid progress in NMR techniques, the solution structures of proteins are signifi- cantly increased. 11 Unfortunately, NMR cannot directly provide the calcium-binding coordination. Although X-ray crystallography has been the major tool to visualize cal- cium-binding sites, calcium-binding sites with weak affini- ties are less defined, if not completely unknown. For example, although it is known that calcium is essential for the function of metabotropic glutamate receptors (mGluR), no calcium-binding sites were observed in several X-ray structures of the extracellular domains of the mGluR. 12–14 In addition, as the worldwide development in structural genomics speeds up to solve numerous protein structures, the prediction of protein’s functions and metal-binding properties becomes more and more important. 15–20 Identi- fying calcium-binding sites is not only crucial for the study of individual proteins but also helpful for revealing the general factors involved in such as the mechanisms govern- ing calcium-binding affinity, selectivity, and calcium- induced conformational change. A fast and accurate meth- odology for predicting calcium-binding sites will facilitate the understanding and predicting calcium roles in biologi- cal systems (denoted as calciomics). 21–23 The Supplementary Material referred to in this article can be found at http://www.interscience.wiley.com/jpages/0887-3585/suppmat/ The first three authors contributed equally to this article. Grant sponsor: Graduate Assistantship from the College of Arts & Sciences at Georgia State University (to H.D.); Grant sponsor: NSF; Grant number: MCB-0092486 (to J.J.Y.); Grant sponsor: NIH; Grant number: GM 62999-1 (to J.J.Y.); Grant sponsor: NSF; Grant number: DMS-0500951 (to. G.C.); Grant sponsor: NSA; Grant number: H98230- 04-1-0300 (to G.C.); Grant sponsor: NIH P20 Award; Grant number: P20 GM065762-01A1 (to J.J.Y. and G.C.). *Correspondence to: Jenny J. Yang, Department of Chemistry, Georgia State University, University Plaza, Atlanta, GA 30302. E-mail: chejjy@langate.gsu.edu Received 16 October 2005; Revised 24 January 2006; Accepted 25 January 2006 Published online 14 April 2006 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.20973 PROTEINS: Structure, Function, and Bioinformatics 64:34 – 42 (2006) © 2006 WILEY-LISS, INC.