Geometric Parameters in Nucleic Acids: Nitrogenous Bases
Lester Clowney, Shri C. Jain, A. R. Srinivasan, John Westbrook,
Wilma K. Olson, and Helen M. Berman*
Contribution from the Department of Chemistry, Rutgers UniVersity,
Piscataway, New Jersey 08855-0939
ReceiVed August 21, 1995
X
Abstract: We present estimates of the bond-length and bond-angle parameters for the nitrogenous base side groups
of nucleic acids. These values are the result of a statistical survey of small molecules in the Cambridge Structural
Database for which high-resolution X-ray and neutron crystal structures are available. The statistics include arithmetic
means and standard deviations for the different samples, as well as comparisons of the population distributions for
sugar- and non-sugar-derivatized bases. These accumulated data provide appropriate target values for refinements
of oligonucleotide structures, as well as sets of standard atomic coordinates for the five common bases.
Introduction
X-ray crystallographic determinations of the structures of
nucleic acids and nucleic acid-protein complexes have in-
creased dramatically over the last several years. A survey of
the Nucleic Acid Database (NDB)
1
shows that there are over
300 solved oligonucleotide structures and 50 nucleic acid
complexes currently available; the number of structure deter-
minations continues to increase. The refinement of such
oligonucleotides, most of which are determined with resolution
poorer than 1 Å, necessitates the use of geometric restraints.
Thus, it is critical to have values for the target bond lengths
and valence angles that are as accurate as possible. The best
source of these target values are high-resolution crystal structures
of nucleic acid analogs, and more than 13 years have passed
since Taylor and Kennard
2
first analyzed the bonding geometries
of nucleic acid base moieties in the Cambridge Structural
Database (CSD).
3
Since then, the number of high-resolution
structures containing the nucleobases available has nearly
doubled, and there are now sufficient data to determine
independent values for uracil and thymine. The larger sample
size further allows for the use of more stringent criteria in
selecting structures to include in the analyses. For instance,
the maximum R factor of structures included in this survey was
6% (compared to a value of 8% in Taylor and Kennard
2
) and
the maximum average error in C-C bonds (average estimated
standard deviation, esd) was 0.01 Å (versus a value of 0.015 Å
in Taylor and Kennard
2
). An updated analysis of the base
structures in the CSD is presented here.
Methods
Selection of Structures. Sets of high-resolution structures contain-
ing the five nitrogenous basesscytosine, thymine, uracil, adenine, and
guanine (Figure 1)swere initially collected from the CSD using the
program QUEST.
3
Protonated cytosines and adenines were treated
independently from neutral species, while protonated guanines were
excluded due to the small sample size. The sampling criteria were
established on the basis of both chemical and crystallographic
considerations.
Only structures with R values better than 6% were used. This value
was chosen after considering at what value of the R factor there is a
statistically significant reduction in the standard deviations of bond
lengths and valence angles. Subsets of bond lengths or bond angles
were examined where increasingly smaller R factors were used as
cutoffs for the structures to include, i.e., the initial set included all
structures with an R factor less than 8%, the second set included those
with a maximum R factor of 7.5%, and so on, using cutoffs down to
R ) 4.5% at 0.5% increments. Means and standard deviations were
determined for each set, and the F test (see below) was used to compare
the variances of the initial set, where the value of R was 8%, with
those of each succeeding set. A significant reduction in the sample
variance was found at R ) 6%.
The selected structures had to meet two additional crystallographic
criteria. The statistical sample was limited to structures with (1)
resolution better than 1 Å, and (2) esd’s for C-C bond lengths less
than 0.01 Å. Using these criteria, most hydrogen atoms were located
directly or with difference Fourier maps.
Several chemical criteria were also used. Only pyrimidines substi-
tuted at N1 and purines substituted at N9 were selected. Of these
structures, those with a sugar substitution were also treated separately
to see if sugar derivatization had a significant effect on base geometry.
Neutral bases and protonated bases were considered separately, while
hemi-protonated bases, crystal structures with transition metals, atoms
as heavy as bromine (Br), and oligonucleotides were excluded from
consideration.
The CSD codes for the structures selected are listed in Table 1.
Software. The CSD programs QUEST and GSTAT
3
were initially
used to select structures and extract information from the CSD. The
program QUEST was used to generate files containing a range of
* To whom correspondence should be addressed.
X
Abstract published in AdVance ACS Abstracts, January 1, 1996.
(1) Berman, H. M.; Olson, W. K.; Beveridge, D. L.; Westbrook, J.;
Gelbin, A.; Demeny, T.; Hsieh, S. H.; Srinivasan, A. R.; Schneider, B.
Biophys. J. 1992, 63, 751-759.
(2) Taylor, R.; Kennard, O. J. Mol. Struct. 1982, 78,1-28.
(3) Allen, F. H.; Bellard, S.; Brice, M. D.; Cartwright, B. A.; Doubleday,
A.; Higgs, H.; Hummelink, T.; Hummelink-Peters, B. G.; Kennard, O.;
Motherwell, W. D. S.; Rodgers, J. R.; Watson, D. G. Acta. Crystallogr.
1979, B35, 2331-2339.
Figure 1. Structures of the nitrogenous bases which are considered in
this survey. The N1 nitrogen atoms of pyrimidines and the N9 nitrogens
of purines are shown in a linkage to the C1′ carbon of the sugar ring.
509 J. Am. Chem. Soc. 1996, 118, 509-518
0002-7863/96/1518-0509$12.00/0 © 1996 American Chemical Society