Universal Similarity Measure
for Comparing Protein
Structures
Marcos R. Betancourt
Jeffrey Skolnick
Laboratory of Computational
Genomics,
The Donald Danforth Plant
Science Center,
893 N. Warson Rd.,
Creve Coeur, MO 63141
Received 24 January 2001;
accepted 22 March 2001
Abstract: We introduce a new variant of the root mean square distance (RMSD) for comparing
protein structures whose range of values is independent of protein size. This new dimensionless
measure (relative RMSD, or RRMSD) is zero between identical structures and one between
structures that are as globally dissimilar as an average pair of random polypeptides of respective
sizes. The RRMSD probability distribution between random polypeptides converges to a universal
curve as the chain length increases. The correlation coefficients between aligned random structures
are computed as a function of polypeptide size showing two characteristic lengths of 4.7 and 37
residues. These lengths mark the separation between phases of different structural order between
native protein fragments. The implications for threading are discussed. © 2001 John Wiley &
Sons, Inc. Biopolymers 59: 305–309, 2001
Keywords: protein structure correlations; relative root mean square distance; universal structure
similarity measures; protein folding
INTRODUCTION
Measures of structural similarity between proteins are
a valuable tool for the analysis of protein structures
and folding simulations.
1,2
One of the most com-
monly used measures is the coordinate root mean
square distance, or RMSD
3–6
[see Eq. (1)], which
describes the mean (RMS) distance per residue be-
tween two optimally aligned structures. The RMSD
can be computed analytically and has the appealing
property that it directly compares the real space co-
ordinates between structures. The latter property al-
lows the RMSD to be more sensitive to global
changes in the structure and to differentiate between
symmetry-related conformations such as mirror im-
ages. This is not necessarily the case for other popular
similarity measures, such as the ones based on the
comparison of local coordinates (e.g., torsion an-
gles
7
), which are less sensitive to global changes, or
based on interresidue distances (e.g., the fraction of
native contacts
8
), which do not distinguish between
mirror images. These and other measures are conve-
nient for specific applications such as finding common
substructures in proteins.
9
However, the RMSD is
arguably one of the most discriminating and conve-
nient measures for comparing the global structure of
proteins.
While most similarity measures can identify two
identical structures, the interpretation of how dissim-
ilar two unequal structures are is strictly measure
dependent. In particular, the correlation between the
RMSD and other similarity measures is typically high
Correspondence to: Jeffrey Skolnick
Biopolymers, Vol. 59, 305–309 (2001)
© 2001 John Wiley & Sons, Inc.
305