MICROBIAL DRUG RESISTANCE Volume 7, Number 2, 2001 Mary Ann Liebert, Inc. Annotated Draft Genomic Sequence from a Streptococcus pneumoniae Type 19F Clinical Isolate* JOAQUÍN DOPAZO, 1,5 ALFONSO MENDOZA, 1 JAVIER HERRERO, 1,5 FABRIZIO CALDARA, 2 YVES HUMBERT, 3,6 LAURENCE FRIEDLI, 3,6 MIREILLE GUERRIER, 3,6 ELISABETH GRAND-SCHENK, 3,6 CARINE GANDIN, 3,6 MASSIMO DE FRANCESCO, 3,6 ALESSANDRA POLISSI, 2 GARY BUELL, 3,6 GEORG FEGER, 3,6 ERNESTO GARCÍA, 4 MANUEL PEITSCH, 3,7 and JOSÉ F. GARCÍA-BUSTOS 1 ABSTRACT The public availability of numerous microbial genomes is enabling the analysis of bacterial biology in great detail and with an unprecedented, organism-wide and taxon-wide, broad scope. Streptococcus pneumoniae is one of the most important bacterial pathogens throughout the world. We present here sequences and func- tional annotations for 2.1-Mbp of pneumococcal DNA, covering more than 90% of the total estimated size of the genome. The sequenced strain is a clinical isolate resistant to macrolides and tetracycline. It carries a type 19F capsular locus, but multilocus sequence typing for several conserved genetic loci suggests that the strain sequenced belongs to a pneumococcal lineage that most often expresses a serotype 15 capsular polysaccha- ride. A total of 2,046 putative open reading frames (ORFs) longer than 100 amino acids were identified (av- erage of 1,009 bp per ORF), including all described two-component systems and aminoacyl tRNA synthetases. Comparisons to other complete, or nearly complete, bacterial genomes were made and are presented in a graphical form for all the predicted proteins. 99 INTRODUCTION S TATISTICS ASSEMBLED BY the World Health Organization (WHO) show that 3.5 million people died worldwide of pneumonia in 1998, more than from any other infectious dis- ease, 17 and one of the main etiological agents of this disease is Streptococcus pneumoniae. Despite widespread use of antibi- otics, S. pneumoniae has remained a major human pathogen world-wide. Microbiology is being revolutionized by the availability of completely sequenced genomes. Individual gene functions can be analyzed within the context of the complete physiological capabilities of the microorganism, and horizontal gene move- ments can be tracked across taxonomic phyla. Although there is a growing number of bacterial genomes in the public domain, an annotated pneumococcal genome is still absent, slowing down research on this important pathogen. Here we present the results of an effort to annotate automatically 2.1 Mbp of DNA sequence covering more than 90% of the total estimated size of the pneumococcal genome. 10 It has been recently shown that analysis of gapped genomic sequences can allow an accurate reconstruction of microbial metabolism. 23 Thus, it is hoped that the public availability of the sequences and functional annota- tions presented here will contribute to basic and clinical re- search on this important pathogen, which is quickly becoming resistant to most useful antibiotics. MATERIALS AND METHODS DNA sequencing Sequencing was done on DNA from a type 19 clinical iso- late, S. pneumoniae strain G54. 19 Genomic DNA was purified on CsCl, mechanically sheared, end-repaired, and ligated into M13mp18 by using BstXI cloning technology to eliminate vec- *The full text of this paper is available online at www.liebertpub.com /mdr 1 Research Department, GlaxoSmithKline S.A., 28760 Tres Cantos, Spain. 2 Department of Microbiology, Medicine Research Centre, GlaxoSmithKline S.p.A., 37100 Verona, Italy. 3 Geneva Biomedical Research Institute, Glaxo Wellcome Research and Development S.A., Switzerland. 4 Departamento de Microbiología Molecular, Centro de Investigaciones Biológicas, CSIC, Velázquez 144, 28006 Madrid, Spain. 5 Present address: Bioinformatics Unit, CNIO, 28220 Majadahonda, Spain. 6 Present address: Serono Pharmaceutical Research Institute, Serono International S.A., 14 chemin des Aulx, CH-1228 Plan-les-Ouates, Geneva, Switzerland. 7 Present address: Novartis Pharma AG, WKL-490, 4002 Basel, Switzerland.