proteins STRUCTURE O FUNCTION O BIOINFORMATICS On the relation between residue flexibility and local solvent accessibility in proteins Hua Zhang, 1,2 * Tuo Zhang, 1,2 Ke Chen, 2 Shiyi Shen, 1,3 Jishou Ruan, 1,3 and Lukasz Kurgan 2 * 1 College of Mathematical Science and LPMC, Nankai University, Tianjin, People’s Republic of China 2 Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada 3 Chern Institute of Mathematics, Nankai University, Tianjin, People’s Republic of China INTRODUCTION Proteins undergo constant thermal fluctuations and other types of motions that span between rapid (picoseconds) vibration and rel- atively slow (microseconds to seconds) movements. 1 The structural flexibility associated with these motions allows implementation of various biological processes such as molecular recognition, enzyme catalysis, allosteric regulation, antigen–antibody interactions, and protein–DNA binding. 2–6 Experimentally available structural data that were derived based on X-ray crystallographic studies provide information on the atomic mobility, which is represented by the atomic displacement parameter, also known as the Debye–Waller temperature factor or B-factor. This parameter reflects the degree of dispersal of atomic electron density around the equilibrium position due to thermal motion and positional disorder. The B-factors have been studied from a variety of viewpoints including the relation between mobility and thermal stability, 7,8 in the context of applica- tions in the prediction of active sites and binding sites, 9–12 in the design of potential function, 13 and in protein function analysis/dis- covery. 2,6,14–16 Molecular dynamic (MD) simulation is one of the most powerful computational methods used to describe and analyze protein flexibility. The main drawback of MD simulations is their high computational cost. 17–19 Several prediction methods that address protein flexibility and that investigate its relation with pro- tein function were developed to overcome this limitation. They include structure-based 20–24 and sequence-based 25–30 methods, where in both cases B-factor was used as the enabling concept. Recent studies show that the structure-based methods, such as the Gaussian network model (GNM), 21 the mean-field-like model, 17 the elastic network model (ENM), 19 the protein fixed-point (PFP) model, 23 and the weighted contact number (WCN) model, 24 could provide better insights to the structure–dynamics–function relation- Additional Supporting Information may be found in the online version of this article. Grant sponsors: NSERC (Canada), National Education Committee of China, NSFC; Grant num- ber: 10671100; Grant sponsors: Liuhui Center for applied mathematics, the joint program of Tianjin and Nankai Universities, Alberta Ingenuity Fund, iCORE. *Correspondence to: Hua Zhang, College of Mathematical Science, Nankai University, Tianjin 300071, People’s Republic of China. E-mail: zerohua@gmail.com or Lukasz Kurgan, Department of Electrical and Computer Engineering, ECERF (9107 116 Street), University of Alberta, Edmonton, AB, Canada T6G 2V4. E-mail: lkurgan@ece.ualberta.ca.. Received 3 October 2008; Revised 5 December 2008; Accepted 16 December 2008 Published online 20 January 2009 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.22375 ABSTRACT We investigate the relationship between the flexi- bility, expressed with B-factor, and the relative solvent accessibility (RSA) in the context of local, with respect to the sequence, neighborhood and related concepts such as residue depth. We observe that the flexibility of a given residue is strongly influenced by the solvent accessibility of the adjacent neighbors. The mean normalized B- factor of the exposed residues with two buried neighbors is smaller than that of the buried resi- dues with two exposed neighbors. Inclusion of RSA of the neighboring residues (local RSA) sig- nificantly increases correlation with the B-factor. Correlation between the local RSA and B-factor is shown to be stronger than the correlation that considers local distance- or volume-based residue depth. We also found that the correlation coeffi- cients between B-factor and RSA for the 20 amino acids, called flexibility-exposure correla- tion index, are strongly correlated with the stabil- ity scale that characterizes the average contribu- tions of each amino acid to the folding stability. Our results reveal that the predicted RSA could be used to distinguish between the disordered and ordered residues and that the inclusion of local predicted RSA values helps providing a bet- ter contrast between these two types of residues. Prediction models developed based on local actual RSA and local predicted RSA show similar or better results in the context of B-factor and disorder predictions when compared with several existing approaches. We validate our models using three case studies, which show that this work provides useful clues for deciphering the structure–flexibility–function relation. Proteins 2009; 76:617–636. V V C 2009 Wiley-Liss, Inc. Key words: B-factor; disordered regions; ordered regions; flexible region; residue depth; secondary structure; active site. V V C 2009 WILEY-LISS, INC. PROTEINS 617