Protein production in Escherichia coli for structural studies by X-ray crystallography Celia W. Goulding a and L. Jeanne Perry a,b, * a UCLA–DOE Center for Genomics and Proteomics, University of California at Los Angeles, Los Angeles, CA 90095-1570, USA b Department of Molecular, Cell and Developmental Biology, University of California at Los Angeles, Los Angeles, CA 90095-1570, USA Received 5 February 2003 Abstract The arrival of genomic sequences to the database has provided a seemingly unlimited supply of targets for protein structure determination and the possibility of solving the structure of an entire proteome. Based on our experience with the proteomes of Pyrobaculum aerophilum and Mycobacterium tuberculosis, we have developed a simple strategy for the production of proteins for structural studies by X-ray crystallography. Our scheme demonstrates a strong protein target commitment and includes the ex- pression of genes from these organisms in Escherichia coli. These proteins are expressed with affinity tags and purified for char- acterization and crystallization. We have identified protein solubility and crystallization as the two major bottlenecks in the process toward the determination of protein structures by X-ray diffraction. Strategies to overcome these bottlenecks are discussed. Ó 2003 Elsevier Science (USA). All rights reserved. Keywords: Protein expression; Protein solubility; Structural genomics; X-ray crystallography 1. Introduction The UCLA–DOE Center for Genomics and Proteo- mics is involved in two structural genomics projects. The first is the structural genomics of Pyrobaculum aerophi- lum, a hyperthermophilic archeon discovered by Karl Stetter and co-workers in a boiling ocean water vent at Maronti Beach, Italy (Volkl et al., 1993). This organism has an optimal growing temperature of 100 °C and is facultatively aerobic. Its genome was sequenced (Fitz- Gibbon et al., 1997) and annotated at UCLA in the laboratory of Jeffrey H. Miller (Fitz-Gibbon, 1998). The genome of P. aerophilum contains 2.2 Mb, with 51% GC content, and is believed to contain approximately 2587 protein-encoding genes covering 88% of the genome (Fitz-Gibbon et al., 2002). One of the goals of the P. aerophilum study is to solve protein structures by X-ray crystallography to elucidate the role of structure in protein thermostability, as well as shedding more light on this little-studied kingdom of the biosphere. The second structural genomics project undertaken by the Center involves Mycobacterium tuberculosis. Stewart Cole and co-workers sequenced the genome of M. tu- berculosis H37Rv in 1998. The genome was discovered to contain 4.4 Mb with a relatively high (66%) GC content (Cole et al., 1998). This organism is the cause of con- siderable suffering and death in the world today. Ap- proximately one-third of the world population is infected with M. tuberculosis (Kochi, 1991) and 2–3 million people die each year due to tuberculosis, principally in countries of the Third World (Webb and Davies, 1999). However, the incidence in the United States has been increasing primarily due to infection of HIV-positive individuals (CDC, 1991; Chalsson and Slutkin, 1989). Especially troubling is the appearance of multiple drug- resistant strains of M. tuberculosis (Fischl et al., 1992; Webb and Davies, 1999). For these reasons, the World Health Organization (1998) has declared tuberculosis an international emergency. Our laboratory is part of the M. tuberculosis Structural Genomics Consortium, which is composed of 80 laboratories from 66 universities and institutes in 11 different countries. The central goal of the M. tuberculosis Structural Genomics Consortium is to Journal of Structural Biology 142 (2003) 133–143 www.elsevier.com/locate/yjsbi Journal of Structural Biology * Corresponding author. Fax: 1-310-206-3914. E-mail address: perry@mbi.ucla.edu (L. Jeanne Perry). 1047-8477/03/$ - see front matter Ó 2003 Elsevier Science (USA). All rights reserved. doi:10.1016/S1047-8477(03)00044-3