Genome Sequences of Escherichia coli B strains REL606 and BL21(DE3) Haeyoung Jeong 1 , Valérie Barbe 2 , Choong Hoon Lee 1,3 , David Vallenet 2 , Dong Su Yu 1 , Sang-Haeng Choi 1 , Arnaud Couloux 2 , Seung-Won Lee 1 , Sung Ho Yoon 1 , Laurence Cattolico 2 , Cheol-Goo Hur 1,4 , Hong-Seog Park 1,4 , Béatrice Ségurens 2 , Sun Chang Kim 3 , Tae Kwang Oh 1,5 , Richard E. Lenski 6 , F. William Studier 7 , Patrick Daegelen 2,8 and Jihyun F. Kim 1,4 Escherichia coli K-12 and B have been the subjects of classical experiments from which much of our understanding of molecular genetics has emerged. We present here complete genome sequences of two E. coli B strains, REL606, used in a long-term evolution experiment, and BL21(DE3), widely used to express recombinant proteins. The two genomes differ in length by 72,304 bp and have 426 single base pair differences, a seemingly large difference for laboratory strains having a common ancestor within the last 67 years. Transpositions by IS1 and IS150 have occurred in both lineages. Integration of the DE3 prophage in BL21(DE3) apparently displaced a defective prophage in the λ attachment site of B. As might have been anticipated from the many genetic and biochemical experiments comparing B and K-12 over the years, the B genomes are similar in size and organization to the genome of E. coli K-12 MG1655 and have N 99% sequence identity over 92% of their genomes. E. coli B and K-12 differ considerably in distribution of IS elements and in location and composition of larger mobile elements. An unexpected difference is the absence of a large cluster of flagella genes in B, due to a 41 kbp IS1-mediated deletion. Gene clusters that specify the LPS core, O antigen, and restriction enzymes differ substantially, presumably because of horizontal transfer. Comparative analysis of 32 independently isolated E. coli and Shigella genomes, both commensals and pathogenic strains, identifies a minimal set of genes in common plus many strain-specific genes that constitute a large E. coli pan-genome. © 2009 Elsevier Ltd. All rights reserved. 1 Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong, Daejeon 305-806, Korea 2 CNRS UMR 8030, Genoscope (CEA), 2 rue Gaston Crémieux, CP 5706, 91000 Evry Cedex, France 3 Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea 4 Functional Genomics Program, University of Science and Technology, Yuseong, Daejeon 305-333, Korea 5 21C Frontier Microbial Genomics and Applications Center, Yuseong, Daejeon 305-806, Korea 6 Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI 48824, USA 7 Biology Department, Brookhaven National Laboratory, P.O. Box 5000, Upton, NY 11973-5000, USA 8 Inserm, 101 rue de Tolbiac, 75013 Paris, France *Corresponding authors. E-mail addresses: jfk@kribb.re.kr; daegelen@genoscope.cns.fr; studier@bnl.gov. Abbreviations used: SNP, single base pair difference; LPS, lipopolysaccharide. doi:10.1016/j.jmb.2009.09.052 J. Mol. Biol. (2009) 394, 644652 Available online at www.sciencedirect.com 0022-2836/$ - see front matter © 2009 Elsevier Ltd. All rights reserved.