Lennon et al. Genome Biology 2010, 11:R15 http://genomebiology.com/2010/11/2/R15 Open Access METHOD © 2010 Lennon et al.; license BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Method A scalable, fully automated process for construction of sequence-ready barcoded libraries for 454 Niall J Lennon 1 , Robert E Lintner 1 , Scott Anderson 1 , Pablo Alvarez 2 , Andrew Barry 1 , William Brockman 3 , Riza Daza 1 , Rachel L Erlich 1 , Georgia Giannoukos 4 , Lisa Green 1 , Andrew Hollinger 1 , Cindi A Hoover 5 , David B Jaffe 4 , Frank Juhn 1 , Danielle McCarthy 1 , Danielle Perrin 1 , Karen Ponchner 1 , Taryn L Powers 1 , Kamran Rizzolo 1 , Dana Robbins 1 , Elizabeth Ryan 1 , Carsten Russ 4 , Todd Sparrow 1 , John Stalker 1 , Scott Steelman 1 , Michael Weiand 1 , Andrew Zimmer 1 , Matthew R Henn 1 , Chad Nusbaum 4 and Robert Nicol* 1 454 library construction An automated method for constructing librar- ies for 454 sequencing significantly reduces the cost and time required. Abstract We present an automated, high throughput library construction process for 454 technology. Sample handling errors and cross-contamination are minimized via end-to-end barcoding of plasticware, along with molecular DNA barcoding of constructs. Automation-friendly magnetic bead-based size selection and cleanup steps have been devised, eliminating major bottlenecks and significant sources of error. Using this methodology, one technician can create 96 sequence-ready 454 libraries in 2 days, a dramatic improvement over the standard method. Background The emergence of next-generation sequencing technolo- gies, such as the Roche/454 Genome Sequencer, the Illu- mina Genome Analyzer, the Applied Biosystems SOLiD sequencer and others, has provided the opportunity for both large genome centers and individual labs to generate DNA sequence data at an unprecedented scale [1]. How- ever, as sequence output continues to increase dramati- cally, processes to generate sequence-ready libraries lag behind in scale. The minimum unit of sequence data (for example, lane or channel) already exceeds the amount required for small projects, such as viral or bacterial genomes, and will continue to increase. As a result, proj- ects with large numbers of samples but small sequence per sample requirements become increasingly challeng- ing to undertake in a cost-effective manner. The 454 Genome Sequencer uses bead-in-emulsion amplification and a pyrosequencing chemistry to gener- ate DNA sequence reads by synthesis [2]. Longer reads and shorter sequencing run times make the 454 platform a powerful tool for de novo assembly of small genomes, metagenomic profiling and amplicon sequencing com- pared with other next-generation sequencing platforms. However, these types of applications pose a challenge in that they require a relatively small number of reads from large numbers of samples. For example, for viruses such as HIV, the small (approximately 10 kb) genome size means that a single sample on even the smallest scale 454 picotiter plate configuration (1 region of a 16 region gas- ket) would yield over 1,500-fold coverage, vastly more coverage than required for genome assembly. Further, the standard 454 library construction protocol is not easily scalable and becomes a major cost driver relative to sequencing when modest numbers of reads are required from each sample. In addition, when sequencing large numbers of isolates of the same organism, the sequence identity between samples makes cross-contamination vir- tually impossible to detect without a molecular (sequence-based) tag. We set out to devise a laboratory process for high-throughput 454 sequencing that is able to generate large numbers of sequence-ready libraries at low cost per sample. Opportunities for sample mix-up errors or cross-contamination must be minimized and the process must also support efficient pooling of sam- ples to avoid the cost of over-sequencing. Key require- ments for this process include: plate-based processing of * Correspondence: nicol@broadinstitute.org 1 Genome Sequencing Platform, Broad Institute of MIT and Harvard, 320 Charles St., Cambridge, MA 02141, USA