Protein Science zyxwvutsrqpo (1995), 4:2392-2404. zyxwvutsrqpon Cambridge University Press. Printed in the USA. Copyright zyxwvutsrqp 0 1995 The Protein Society Significance of structural changes in proteins: Expected errors in refined protein structures ROBERT M. STROUD zyxwvutsrq AND ERIC B. FAUMAN’ Department of Biochemistry and Biophysics, University of California-San Francisco, San Francisco, California 94143-0448 (RECEIVED June zyxwvutsrqponm 6, 1995; ACCEPTED September zyxwvutsr 11, 1995) Abstract A quantitative expression key to evaluating significant structural differences or induced shifts between any two protein structuresis derived. Because crystallography leads to reports of a single (or sometimes dual) position for each atom, the significance of any structural change based on comparison of two structures depends critically on knowing the expected precision of each median atomic position reported, and on extracting it for each atom, from the information provided in the Protein Data Bank and in the publication. The differences between structures of protein molecules that should be identical, and that are normally distributed, indicating that they are not af- fected by crystal contacts, were analyzed with respect to many potential indicators of structure precision, so as to extract, essentially by “machine learning” principles, a generally applicable expression involving the highest correlates. Eighteen refined crystal structures from the Protein Data Bank, in which there are multiple molecules in the crystallographic asymmetric unit, were selected and compared. The thermal E factor, the connectivity of the atom, and the ratio of the number of reflections to the number of atoms used in refinement correlate best with the magnitude of the positional differences between regions of the structures that otherwise would be expected to be the same. These results are embodied in a six-parameter equation that can be applied to any crystallograph- ically refined structure to estimate the expected uncertainty in position of each atom. Structure change in a mac- romolecule can thus be referenced to the expected uncertainty in atomic position as reflected in the variance between otherwise identical structures with the observed values of correlated parameters. Keywords: accuracy; B factor;conformationchange;crystallography;errors;positionaldifference;protein structure ~ ~~~ ~~ ~ ~ ~~ ~~ ~~ Reprint requests to: Robert M. Stroud, S-960 Department of Bio- chemistry and Biophysics, University of California-San Francisco, San Francisco, California 94143-0448; e-mail: stroud@msg.ucsf.edu. Present address: Biophysics Research Division, IST 1208 Box 2099, University of Michigan, Ann Arbor, Michigan 48109-2099. Abbreviations: zyxwvutsrq Ax, Ay, zyxwvutsrq Az, difference in position of a single atom between a pair of structures along the x, y, or z axis, respectively; x. gen- eralized one-dimensional axis that represents the average over all pos- sible orientations; uy, standard deviation in one dimension of the Gaussian portion of positional differences between a pair of structures; Ar, the difference distance in position between atoms in different struc- tures -.&x2 + Ay2 + Az2; zyxwvutsrqpo ur, standard deviation of the Maxwellian dis- tribution of positional differences in a pair of structures: ur = &,; A<I{, distance between the observed position of an atom and its ‘‘true’’ position; zyxwvutsrqpon ut,{, standard deviation of the Maxwellian distribution of Art: zyxwvutsr u,,~ = (n/2)u,; ATOM, number of independently refined atom posi- tions in the asymmetric unit; REFL, number of independent reflections used in refinement; t,(B,ATOM/REFL), empirically derived estimate of ux as a function of B factor and the ratio of ATOMIREFL for a given structure; N.E.S.(subset), normalized error score, defined as the deviation from t,(B,ATOM/REFL) for a selected subset of atoms; N.E.S.,(subset), normalized error score calculatedusing only structure i of the 18 structures used in the analysis; uNps(subset), standard devi- ation of the 18 values for N.E.S.,(subset). ~~ ~ Structural differences between macromolecules canbest be evalu- ated as to significance by reference to the expected distribution of uncertainty, or positional variations in regions that are com- pared. Such variations differ widely for different regions of pro- tein structure, as reflected in electron density maps and deduced thermal factors, and depend on connectivity of the atom and ap- plied constraints, resolution of the analysis, number of observa- tions, and method of refinement and other factors. Structures determined by NMR are often represented as a manifold that are consistent with the data because the errors in closely cou- pled distances and angles are cumulative for regions of sequence that are separated by longer through-bond distances. Here we focus on structures of proteins as determined by X-ray crystal- lography. We derive a readily accessible calculation that best pre- dicts the expected positional uncertainty for any atom in any particular protein structure determination, from the informa- tion on the refined structure readily available in the Protein Data Bank format, or from publications of the structure analysis. There is information that directly pertains to positional vari- ance in the courseof crystallographic refinement, theresolution 2392