Lennon et al. Genome Biology 2010, 11:R15
http://genomebiology.com/2010/11/2/R15
Open Access METHOD
© 2010 Lennon et al.; license BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Method
A scalable, fully automated process for
construction of sequence-ready barcoded libraries
for 454
Niall J Lennon
1
, Robert E Lintner
1
, Scott Anderson
1
, Pablo Alvarez
2
, Andrew Barry
1
, William Brockman
3
, Riza Daza
1
,
Rachel L Erlich
1
, Georgia Giannoukos
4
, Lisa Green
1
, Andrew Hollinger
1
, Cindi A Hoover
5
, David B Jaffe
4
, Frank Juhn
1
,
Danielle McCarthy
1
, Danielle Perrin
1
, Karen Ponchner
1
, Taryn L Powers
1
, Kamran Rizzolo
1
, Dana Robbins
1
,
Elizabeth Ryan
1
, Carsten Russ
4
, Todd Sparrow
1
, John Stalker
1
, Scott Steelman
1
, Michael Weiand
1
, Andrew Zimmer
1
,
Matthew R Henn
1
, Chad Nusbaum
4
and Robert Nicol*
1
454 library construction An automated method for constructing librar- ies for 454 sequencing significantly reduces the cost and time required.
Abstract
We present an automated, high throughput library construction process for 454 technology. Sample handling errors
and cross-contamination are minimized via end-to-end barcoding of plasticware, along with molecular DNA
barcoding of constructs. Automation-friendly magnetic bead-based size selection and cleanup steps have been
devised, eliminating major bottlenecks and significant sources of error. Using this methodology, one technician can
create 96 sequence-ready 454 libraries in 2 days, a dramatic improvement over the standard method.
Background
The emergence of next-generation sequencing technolo-
gies, such as the Roche/454 Genome Sequencer, the Illu-
mina Genome Analyzer, the Applied Biosystems SOLiD
sequencer and others, has provided the opportunity for
both large genome centers and individual labs to generate
DNA sequence data at an unprecedented scale [1]. How-
ever, as sequence output continues to increase dramati-
cally, processes to generate sequence-ready libraries lag
behind in scale. The minimum unit of sequence data (for
example, lane or channel) already exceeds the amount
required for small projects, such as viral or bacterial
genomes, and will continue to increase. As a result, proj-
ects with large numbers of samples but small sequence
per sample requirements become increasingly challeng-
ing to undertake in a cost-effective manner.
The 454 Genome Sequencer uses bead-in-emulsion
amplification and a pyrosequencing chemistry to gener-
ate DNA sequence reads by synthesis [2]. Longer reads
and shorter sequencing run times make the 454 platform
a powerful tool for de novo assembly of small genomes,
metagenomic profiling and amplicon sequencing com-
pared with other next-generation sequencing platforms.
However, these types of applications pose a challenge in
that they require a relatively small number of reads from
large numbers of samples. For example, for viruses such
as HIV, the small (approximately 10 kb) genome size
means that a single sample on even the smallest scale 454
picotiter plate configuration (1 region of a 16 region gas-
ket) would yield over 1,500-fold coverage, vastly more
coverage than required for genome assembly. Further, the
standard 454 library construction protocol is not easily
scalable and becomes a major cost driver relative to
sequencing when modest numbers of reads are required
from each sample. In addition, when sequencing large
numbers of isolates of the same organism, the sequence
identity between samples makes cross-contamination vir-
tually impossible to detect without a molecular
(sequence-based) tag. We set out to devise a laboratory
process for high-throughput 454 sequencing that is able
to generate large numbers of sequence-ready libraries at
low cost per sample. Opportunities for sample mix-up
errors or cross-contamination must be minimized and
the process must also support efficient pooling of sam-
ples to avoid the cost of over-sequencing. Key require-
ments for this process include: plate-based processing of
* Correspondence: nicol@broadinstitute.org
1
Genome Sequencing Platform, Broad Institute of MIT and Harvard, 320
Charles St., Cambridge, MA 02141, USA