Optimized Multiplex PCR: Efficiently Closing a Whole-Genome
Shotgun Sequencing Project
Herve ´ Tettelin,*
,1
Diana Radune,* Simon Kasif,² Hoda Khouri,* and Steven L. Salzberg*
,
‡
*The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850; ²Department of Electrical Engineering
and Computer Science, University of Illinois, Chicago, Illinois 60607; and ‡Department of Computer Science,
Johns Hopkins University, Baltimore, Maryland 21218
Received June 8, 1999; accepted October 26, 1999
A new method has been developed for rapidly clos-
ing a large number of gaps in a whole-genome shotgun
sequencing project. The method employs multiplex
PCR and a novel pooling strategy to minimize the
number of laboratory procedures required to se-
quence the unknown DNA that falls in between con-
tiguous sequences. Multiplex sequencing, a novel pro-
cedure in which multiple PCR primers are used in a
single sequencing reaction, is used to interpret the
multiplex PCR results. Two protocols are presented,
one that minimizes pipetting and another that mini-
mizes the number of reactions. The pipette optimized
multiplex PCR method has been employed in the final
phases of closing the Streptococcus pneumoniae ge-
nome sequence, with excellent results. © 1999 Academic
Press
INTRODUCTION
In the late stages of a whole-genome shotgun se-
quencing project, most DNA sequences will be assem-
bled into large contiguous blocks, or contigs (Fraser
and Fleischmann, 1997). As the project nears comple-
tion, the number of contigs grows smaller as the con-
tigs themselves grow larger. Due to nonrandomness in
the library and unclonable sequences, some regions of
the genome are not represented in the contigs, result-
ing in gaps. Other gaps result from extremely GC-rich
or GC-poor regions and large repeat sequences. Signif-
icant effort is needed to close these gaps to finish the
project. Constructing the initial shotgun clone library
with plasmid vectors allows double-strand sequencing,
using universal forward and reverse primers, produc-
ing sequence data from both ends of most clones. In
some cases, gaps between contigs will be spanned by
clones whose forward sequencing read is located at the
extreme end of one contig and whose reverse read
(“clone mate”) is located at the end of another contig.
Such gaps are called sequence gaps, and they can be
“walked” by synthetic primers using the shotgun clone
as a template (e.g., see gap closure methods in Fleisch-
mann et al. (1995)). However, many contig ends will
remain unlinked, especially when no physical map of
the genome is available. Therefore the order of these
contigs and the size of the gaps in between them are
unknown.
Some of these physical ends can be extended by
primer walking directly on genomic DNA (Heiner et al.,
1998). The efficiency of this approach is highly depen-
dent on the purity and integrity of the genomic DNA,
but it can be useful in linking more sequences or con-
tigs to the contig’s end. Genomic primer walking be-
comes tedious if the gap is larger than a few hundred
basepairs, and any contigs linked this way still need to
be checked by PCR to confirm their order in the overall
genome. Walking on genomic DNA is possible only if
the region of interest is unique in the genome, so that
the walking primer will hybridize at only one location
on the DNA and produce a unique sequence. Unfortu-
nately, physical ends (and gaps) are frequently the
result of repetitive sequences that cannot be resolved
by sequence assembly algorithms. Because such re-
peats are usually longer than the average sequence
read (else they would not have caused a problem for the
assembler), walking using a primer located outside the
repeat will not get across the repeat and therefore will
not extend the physical end into the gap.
This problem can be circumvented by generating
PCR products across each gap using unique primers
located outside repeats. These PCR products can sub-
sequently be walked using the product itself as a tem-
plate, where the repeats do not cause a problem be-
cause they are unique within the PCR product (except
in the case of long tandem repeats). In addition, PCR
products do not need to be cloned prior to sequencing,
and therefore regions potentially toxic to the host (an-
other cause of gaps in a shotgun sequencing project)
will nonetheless be sequenced.
To cover all gaps with PCR products, each physical
end must be tested by PCR against all of the other
1
To whom correspondence should be addressed. Telephone: (301)
838-3542. Fax: (301) 838-0208. E-mail: tettelin@tigr.org.
Genomics 62, 500 –507 (1999)
Article ID geno.1999.6048, available online at http://www.idealibrary.com on
500
0888-7543/99 $30.00
Copyright © 1999 by Academic Press
All rights of reproduction in any form reserved.