Universal Similarity Measure for Comparing Protein Structures Marcos R. Betancourt Jeffrey Skolnick Laboratory of Computational Genomics, The Donald Danforth Plant Science Center, 893 N. Warson Rd., Creve Coeur, MO 63141 Received 24 January 2001; accepted 22 March 2001 Abstract: We introduce a new variant of the root mean square distance (RMSD) for comparing protein structures whose range of values is independent of protein size. This new dimensionless measure (relative RMSD, or RRMSD) is zero between identical structures and one between structures that are as globally dissimilar as an average pair of random polypeptides of respective sizes. The RRMSD probability distribution between random polypeptides converges to a universal curve as the chain length increases. The correlation coefficients between aligned random structures are computed as a function of polypeptide size showing two characteristic lengths of 4.7 and 37 residues. These lengths mark the separation between phases of different structural order between native protein fragments. The implications for threading are discussed. © 2001 John Wiley & Sons, Inc. Biopolymers 59: 305–309, 2001 Keywords: protein structure correlations; relative root mean square distance; universal structure similarity measures; protein folding INTRODUCTION Measures of structural similarity between proteins are a valuable tool for the analysis of protein structures and folding simulations. 1,2 One of the most com- monly used measures is the coordinate root mean square distance, or RMSD 3–6 [see Eq. (1)], which describes the mean (RMS) distance per residue be- tween two optimally aligned structures. The RMSD can be computed analytically and has the appealing property that it directly compares the real space co- ordinates between structures. The latter property al- lows the RMSD to be more sensitive to global changes in the structure and to differentiate between symmetry-related conformations such as mirror im- ages. This is not necessarily the case for other popular similarity measures, such as the ones based on the comparison of local coordinates (e.g., torsion an- gles 7 ), which are less sensitive to global changes, or based on interresidue distances (e.g., the fraction of native contacts 8 ), which do not distinguish between mirror images. These and other measures are conve- nient for specific applications such as finding common substructures in proteins. 9 However, the RMSD is arguably one of the most discriminating and conve- nient measures for comparing the global structure of proteins. While most similarity measures can identify two identical structures, the interpretation of how dissim- ilar two unequal structures are is strictly measure dependent. In particular, the correlation between the RMSD and other similarity measures is typically high Correspondence to: Jeffrey Skolnick Biopolymers, Vol. 59, 305–309 (2001) © 2001 John Wiley & Sons, Inc. 305