NMR Observation of a Novel C-Tetrad in the Structure of the SV40 Repeat Sequence GGGCGG P. K. Patel, Neel S. Bhavesh, and R. V. Hosur 1 Department of Chemical Sciences, Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400 005, India Received March 3, 2000 We report the NMR structure of the DNA sequence d-TGGGCGGT in Na solutions at neutral pH, contain- ing a repeat sequence from SV40 viral genome. The structure is a novel quadruplex incorporating the C-tetrad formed by symmetrical pairing of four Cs via NH 2 OO 2 H-bonds in a plane. The C-tetrad has a wider cavity compared to G-tetrads and stacks well over the adjacent G4-tetrad, but poorly on the G6 tetrad. The quadruplex helix is largely underwound by 8 –10° com- pared to B-DNA except at the C5–G6 step. To our knowledge this is the first report of C-tetrad formation in DNA structures, and would be of significance from the point of view of both structural diversity and spe- cific recognition. © 2000 Academic Press Key Words: NMR structure; G-quadruplex; C-tetrad; GGGCGG repeat. Multistranded DNA structures have assumed great importance in recent years with the realization that they play important roles in DNA recombination, rep- lication, disease control, etc. on the one hand and DNA packaging inside a living cell on the other (1, 2). Among these, the quadruplex structures formed by many re- peat sequences containing stretches of Gs have exhib- ited great variety, dependence on experimental condi- tions, especially the cations (3– 6). They have created new paradigms and underscored the possibility of hith- erto not understood roles for DNA function. The vari- ability in the quadruplex structures is seen to be so large that given a repeat sequence and the experimen- tal conditions, it is hardly possible to predict the char- acteristics of the structure. Thus investigations on dif- ferent sequences under different conditions have led to the discoveries of many new structural motifs such as, G:C:G:C tetrad (7, 8), U-tetrad (9), A-tetrad (10, 11), and T-tetrad (12). We report here the observation of yet another new motif, namely, the C-tetrad, which we discovered in the sequence d-TGGGCGGT. The GGGCGG sequence contained in this DNA is biologi- cally significant for several reasons: (i) it is a repeat sequence in Simian virus (SV) 40, playing important roles in viral encapsidation (13, 14), (ii) it is a target for many anti cancer drugs (13), (iii) it is the recognition sequence of the SP1 transcription factor (15) and (iv) it is a very common sequence in CpG islands in verte- brate genomes (16). MATERIALS AND METHODS DNA samples. The oligonucleotide was synthesized on an applied Bio-systems 392 automated DNA synthesizer on 10 M scale using solid phase -cyanoethyl phosphoramidite chemistry, cleaved from support and purified by standard procedures (17, 18). The NMR sample was prepared at a monomer strand concentration range of 1–2 mM in 0.6 ml (90% H 2 O/10%D 2 O) having 10 mM sodium phos- phate, 0.2 mM EDTA, pH 7.0, and 200 mM NaCl. For experiments in D 2 O, the same sample was repeatedly lyophilized from D 2 O. NMR data acquisition and processing. NMR data were obtained on a VARIAN UNITY-plus 600 spectrometer. Temperature depen- dence one-dimensional spectra (-5–50°C) and NOESY spectra in H 2 O were recorded using jump-and-return pulse sequence (19) for H 2 O suppression. Phase sensitive NOESY (20) and TOCSY (21) spectra in D 2 O were recorded with mixing times of 80, 100, 200, and 300 ms for NOESY and 60 and 20 ms for TOCSY. A DQF-COSY was recorded for coupling constant estimation. In all the 2D experiments, the time domain data consisted of 2048 complex points in t 2 and 400 – 600 fids (free induction decay signals) in t 1 dimension. The VARIAN data were processed using VNMR, Felix-230 and Felix-97 software on an IRIS workstation. The data were apodized by shifted (60 –90°) sine bell functions prior to 2D Fourier transformations. Experimental restraints. The cross-peaks in the NOESY spectra in D 2 O were integrated and the intensities in a low mixing time NOESY were translated into interproton distances using the initial rate approximation using CH5–CH6 cross-peak intensity as the ref- erence (2.46 Å). Then, these cross-peaks in different spectra were classified as strong, medium and weak according to their relative intensities and the interproton distances were restrained with upper and lower bounds of 0.2, 0.5, and 1.0 Å from their calculated distances, respectively. The narrow bounds were mostly on strong intranucleotide cross-peaks for which the possible distance ranges are small and known. For the sequential internucleotide NOEs, loose Abbreviations used: DNA, deoxyribonucleic acid, NMR, nuclear magnetic resonance, NOESY, nuclear Overhauser enhancement spectroscopy, TOCSY, total correlation spectroscopy, DQF-COSY, double quantum filtered correlation spectroscopy, IRMA, iterative relaxation matrix analysis; MD, molecular dynamics. 1 To whom correspondence should be addressed. Fax: 091-22-215 2110. E-mail: hosur@tifr.res.in. Biochemical and Biophysical Research Communications 270, 967–971 (2000) doi:10.1006/bbrc.2000.2479, available online at http://www.idealibrary.com on 967 0006-291X/00 $35.00 Copyright © 2000 by Academic Press All rights of reproduction in any form reserved.