J. Mol. Biol. (1977) 110, 467-510 Nucleotide Sequence of Simian Virus 40 Hind H Restriction Fragment G. VOLCKAERT, ~:~. CONTRERAS, E. SO~.DA, A. VAN D~. VOOaDV, AND W. F~RS Laboratory of Molecular Biology and Laboratory of Physiological Chemistry Ledeganclcstraat, 35, 9000 Ghent, Belgium (Received 1 September 1976, and in revised form 15 November 1976) The restriction fragment Hind H contains 5-2% of the genome of simian virus 40 (SVd0) and is located near the middle of the early region. It can be split by the Arthrobacter luteus (Alu) enzyme into fragments Hind H-A1 and Hind H-A2. The nucleotide sequence of fragment Hind H is reported here. It has been established by analysis of transcription products, synthesized by Escherichia coli RNA polyunerase and nucleoside triphosphates, one of which was (~-auP)-labeled. These products are very heterogeneous in size and may even exceed the length of the template. Strand assignment was possible by hybridizing the asymmetric, labeled transcript of total SVd0 DNA to filter-botmd Hind H DNA. Very clean, discrete products were obtained under appropriate conditions where transcription was dependent on an added primer such as (Ap)sA. These transcripts were derived from one strand only, except in the case of Hind H-A2, where the product (in a single chain) contained information derived from both strands. Unambigu- ous confn'mation of the sequence was obtained by experiments directly on the terminally labeled DNA fragments. The message strand is particularly Ap-rich (37~ and purine rich. The di- nucleotide CpG occurs only once but UpC is also rare. The nucleotide sequence can be translated mlambiguously into an amino acid sequence. This polypeptide, which is part of the gene A-protein, is neutral, rich in methionlne, cysteine and tyrosine, and has a. high lysine to arginine ratio. All serine codons are of the A-G-Py type ; A and U are clearly preferred as third bases. 1. Introduction Simian virus 40 (SVd0) contains a circular, supercoiled DNA genome ~dth Mr ----- 3.2 • 10 s. The restriction enzyme preparation from Hemophilus influenzae Rd (a mixture of HindII and HindIII) cleaves this SVd0 DNA into 13 fragments, which have been ordered (Danna & Nathans, 1971; Yang et al., 1975). Various biological functions have been located on the physical map, as defined by the restriction enzyme cleavage sites. More particularly, it has been sho~ that the early messenger accounts for about 48~/o of the genetic information (Sambrook et al., 1972; Khoury et al., 1975) and is transcribed counterclockwise (Lindstrom & Dulbecco, 1972; Sambrook et al., 1973; Khoury et al., 1973). Presumably it codes for a single large polypeptide (Del Villano & Defendi, 1973; Prives et al., 1975; Tegtmeyer, 1974; Smith et al., 1975). Genetically this product is defined by the temperature sensitive A complementation group (Tegtmeyer & Ozer, 1971; Chou & Martin, 1974; Lai & Nathans, 1975), and it is likely that it corresponds to the T-antigen. The action of the 467