Drug Discovery Today Volume 12, Numbers 21/22 November 2007 REVIEWS DNA fragmentation-based combinatorial approaches to soluble protein expression Part I. Generating DNA fragment libraries Chrisostomos Prodromou 1,4 , Renos Savva 2,4 and Paul C. Driscoll 3,4 1 Section of Structural Biology, Institute of Cancer Research, Chester Beatty Laboratories, 237 Fulham Road, London SW3 6JB, United Kingdom 2 Institute of Structural Molecular Biology, School of Crystallography, Birkbeck College, Malet Street, London WC1E 7HX, United Kingdom 3 Institute of Structural Molecular Biology, Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, United Kingdom 4 Domainex Ltd., The London Bioscience Innovation Centre, 2 Royal College Street, Camden, London NW1 0NH, United Kingdom In addressing a new drug discovery target, the generation of tractable protein substrates for functional and structural analyses can represent a significant hurdle. Traditional approaches rely on protein expression trials of multiple variants in various systems, frequently with limited success. The increasing knowledge base derived from genomics and structural proteomics initiatives assists the bioinformatics- led design of these experiments. Nevertheless, for many eukaryotic polypeptides, particularly those with relatively few homologues, the generation of useful protein products can still be a major challenge. This review describes the basis of efforts to forge an alternative ‘domain-hunting’ paradigm, based upon combinatorial sampling of expression construct libraries derived by fragmentation of the encoding DNA template, namely the methods and considerations in generating fragment length DNA from target genes. An accompanying review focuses upon the expression screening of such combinatorial DNA libraries for the sampling of the corresponding set of protein fragments. It has been an implicit, if not explicit, assumption that the under- taking of the Human Genome Project and other efforts to catalo- gue the genes of pathogenic organisms will drive a major expansion of the prescription pharmaceutical market [1–5]. One basis for this prediction is that recent analyses show that the majority of present-day drugs act upon a rather small number (perhaps <1000) of distinct macromolecular targets (e.g. GPCRs, nuclear receptors, ion channels, proteases, kinases, phosphodies- terases and so on) [6]. Knowledge of the human genome sequence opens up the potential to explore many novel, perhaps rare, target types that have previously evaded identification by classical meth- ods. Exploitation of these new targets will depend upon our ability to translate the emerging genomic information into tractable in vitro and cell-based assays of macromolecular function. In this context, within many areas of biomedicine, there is an increasing need to understand protein function at the atomic level, which implies having 3D structural information derived from X-ray crystallography and, where applicable, multi-dimensional solu- tion NMR spectroscopy [7,8]. Both methods of analysis place a relatively high burden on the quantity, solubility, stability and ‘foldedness’ of the macromolecular analyte. An intrinsic bottle- neck in such efforts is often the generation of recombinant, soluble, tractable protein materials that can be used for both inhibitor screening and structure-based drug discovery approaches. In general, one finds that proteins corresponding to whole open reading frames (ORFs) of cloned cDNAs can turn out to be difficult or impossible to produce in a facile manner. Therefore, a great deal of resource, both in academia and the commercial biotechnology and pharmaceutical sectors, is expended on efforts to obtain tractable fragments of such proteins in a paradigm that is led by bioinformatics-driven prediction of the likely stable glob- ular domains, or by limited proteolysis of isolated full length proteins. In this review, we discuss an emerging principle that attempts to bypass these ‘traditional’ approaches, by appealing to high-throughput screening of DNA fragment libraries to identify stable, functionally and structurally tractable fragments of poly- Reviews POST SCREEN Corresponding author: Driscoll, P.C. (p.driscoll@ucl.ac.uk) 1359-6446/06/$ - see front matter ß 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.drudis.2007.08.012 www.drugdiscoverytoday.com 931