Quantitative Assessment of Peptide Sequence Diversity in M13 Combinatorial Peptide Phage Display Libraries Diane J. Rodi 1 , Alexei S. Soares 2 and Lee Makowski 1 * 1 Combinatorial Biology Unit Biosciences Division Argonne National Laboratory Bldg 202, 9700 South Cass Avenue, Argonne IL 60439, USA 2 Biology Department Brookhaven National Laboratory, Bldg 463 PO Box 5000, Upton NY 11973, USA Novel statistical methods have been developed and used to quantitate and annotate the sequence diversity within combinatorial peptide libraries on the basis of small numbers (1–200) of sequences selected at random from commercially available M13 p3-based phage display libraries. These libraries behave statistically as though they correspond to populations containing roughly 4.0 ^ 1.6% of the random dodecapeptides and 7.9 ^ 2.6% of the random constrained heptapeptides that are theo- retically possible within the phage populations. Analysis of amino acid residue occurrence patterns shows no demonstrable influence on sequence censorship by Escherichia coli tRNA isoacceptor profiles or either overall codon or Class II codon usage patterns, suggesting no metabolic constraints on recombinant p3 synthesis. There is an overall depression in the occurrence of cysteine, arginine and glycine residues and an over- abundance of proline, threonine and histidine residues. The majority of position-dependent amino acid sequence bias is clustered at three positions within the inserted peptides of the dodecapeptide library, þ 1, þ 3 and þ 12 downstream from the signal peptidase cleavage site. Confor- mational tendency measures of the peptides indicate a significant prefer- ence for inserts favoring a b-turn conformation. The observed protein sequence limitations can primarily be attributed to genetic codon degeneracy and signal peptidase cleavage preferences. These data suggest that for applications in which maximal sequence diversity is essential, such as epitope mapping or novel receptor identification, combinatorial peptide libraries should be constructed using codon-corrected trinucleo- tide cassettes within vector–host systems designed to minimize morpho- genesis-related censorship. q 2002 Elsevier Science Ltd. All rights reserved Keywords: sequence diversity; filamentous phage; phage display; viral assembly; combinatorial peptide *Corresponding author Introduction Since the seminal work of Scott & Smith 1 describing the first affinity selection from an exten- sive combinatorial peptide library displayed on the surface of a phage particle, phage display tech- nology has been utilized for numerous purposes including mapping protein – ligand interactions, 2 identifying binding antagonists and enzyme inhibitors 3 and designing peptide mimotopes as reduced risk vaccination agents. 4 These appli- cations fall into one of two categories: identifi- cation of peptide reagents that exhibit certain binding characteristics but no one particular primary amino acid sequence; and identification of peptides that mimic both the binding character- istics and amino acid sequence of a native protein. The ability to identify a displayed combinatorial peptide that mimics a ligand-binding region of a naturally occurring protein has been shown to depend upon a number of factors including length and linear continuity of the binding epitope and the number of independent sequences in the unselected combinatorial library. 5–8 Comprehensive analysis of the primary amino acid sequence diversity in combinatorial peptide libraries is an issue that has received short shrift in the literature. Estimates of the number of 0022-2836/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved E-mail address of the corresponding author: lmakowski@anl.gov Abbreviation used: pmf, proton motive force. doi:10.1016/S0022-2836(02)00844-6 available online at http://www.idealibrary.com on B w J. Mol. Biol. (2002) 322, 1039–1052