ESPRIT: An automated, library-based method for mapping and soluble expression of protein domains from challenging targets Hayretin Yumerefendi 1 , Franck Tarendeau 1 , Philippe J. Mas, Darren J. Hart * Unit of Virus Host–Cell Interactions, UJF-EMBL-CNRS, UMI 3265, 6 rue Jules Horowitz, BP181, 38042 Grenoble Cedex 9, France Grenoble Outstation, European Molecular Biology Laboratory, 6 rue Jules Horowitz, BP181, 38042 Grenoble Cedex 9, France article info Article history: Received 12 January 2010 Received in revised form 24 February 2010 Accepted 28 February 2010 Available online 4 March 2010 Keywords: ESPRIT NF-jB PB2 TBK1 High-throughput Directed evolution Protein structure Protein expression abstract Expression of sufficient quantities of soluble protein for structural biology and other applications is often a very difficult task, especially when multimilligram quantities are required. In order to improve yield, solubility or crystallisability of a protein, it is common to subclone shorter genetic constructs correspond- ing to single- or multi-domain fragments. However, it is not always clear where domain boundaries are located, especially when working on novel targets with little or no sequence similarity to other proteins. Several methods have been described employing aspects of directed evolution to the recombinant expression of challenging proteins. These combine the construction of a random library of genetic con- structs of a target with a screening or selection process to identify solubly expressing protein fragments. Here we review several datasets from the ESPRIT (Expression of Soluble Proteins by Random Incremental Truncation) technology to provide a view on its capabilities. Firstly, we demonstrate how it functions using the well-characterised NF-jB p50 transcription factor as a model system. Secondly, application of ESPRIT to the challenging PB2 subunit of influenza polymerase has led to several novel atomic resolu- tion structures; here we present an overview of the screening phase of that project. Thirdly, analysis of the human kinase TBK1 is presented to show how the ESPRIT technology rapidly addresses the compat- ibility of challenging targets with the Escherichia coli expression system. Ó 2010 Elsevier Inc. All rights reserved. 1. Introduction The success of structural characterisation of proteins largely de- pends on the ability to produce sufficient quantities, generally tens of milligrams, of soluble purified protein (Blundell et al., 2002). Sim- ilar amounts of protein may be required for other non-structural applications such as inhibitor screening or biophysical analyses. The challenges associated with overexpression and purification of monodisperse, soluble, purifiable protein are well-appreciated by any structural biology laboratory. Over the last decade, structural genomics projects have permitted a more quantitative measure of the efficiency of different steps in the structure solution process un- der a relatively standardised set of conditions (Burley, 2000; O’Toole et al., 2004). Clearly, proteins do not behave uniformly during re- combinant expression steps, notably the success rate of a single-do- main protein may be very different to a large multi-domain protein when Escherichia coli is used as the preferred production system. Success rates for bacterial expression are generally higher with pro- karyotic proteins than those from human or viral origin, perhaps be- cause the bacterial proteins are frequently smaller with a simpler domain arrangement. Human or viral proteins are often larger and comprise multiple domains connected by longer linkers or low- complexity regions (Ward et al., 2004). They may be subunits of multi-component complexes, or at least require interaction of part- ners for stability, either via binding or post-translational modifica- tion. Expression of such proteins in full-length form in E. coli frequently results in aggregation or degradation (Dobson, 2004). Sometimes targets can be expressed more successfully in eukaryotic expression hosts, for example in insect cells using the baculovirus expression system (Hofinger et al., 2007), but these are far from uni- versal solutions and present their own set of obstacles such as cost, duration, variable post-translational modification and incompatibil- ity with isotopic or heavy atom labelling. When a full-length protein fails to express in soluble form, or a purified protein does not crystallise, isolation of shorter con- structs encompassing single- or multi-domain protein fragments is a common approach. Obtaining well-expressing protein domains is, traditionally, a time-consuming process involving repeated 1047-8477/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.jsb.2010.02.021 Abbreviations: ESPRIT, expression of soluble proteins by random incremental truncation; GFP, green fluorescent protein; TBK1, TANK-binding kinase 1; IPTG, isopropyl-beta-D-thiogalactopyranoside; TEV, tobacco etch virus; BCCP, biotin carboxyl carrier protein.. * Corresponding author. Address: Unit of Virus Host–Cell Interactions, UJF-EMBL- CNRS, UMI 3265, 6 rue Jules Horowitz, BP181, 38042 Grenoble Cedex 9, France. E-mail address: hart@embl.fr (D.J. Hart). 1 These authors contributed equally to this work. Journal of Structural Biology 172 (2010) 66–74 Contents lists available at ScienceDirect Journal of Structural Biology journal homepage: www.elsevier.com/locate/yjsbi