© 2014 Nature America, Inc. All rights reserved.
NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION 1
ARTICLES
Biological systems are able to build intricate materials and chemicals
that require precise dynamic and spatial control over many genes.
However, engineering large systems that are composed of many genetic
parts is not straightforward. First, software is focused on combining
parts at the DNA sequence level, which can make the design proc-
ess time consuming. Second, it takes months to prototype a design.
Although DNA synthesis is routine for individual genes
1
and, indeed,
has been used to build entire 1-Mb genomes
2
, it remains too costly to
simultaneously synthesize many large alternative designs. In practice,
it is feasible to build only a few designs for testing, so finding a design
that works can take considerable time.
This is further complicated when working with large, naturally
occurring systems that are the product of evolutionary forces and have
redundant and often overlapping regulatory elements
3,4
. Even in well-
characterized systems, not all of the regulation or regulatory parts (for
example, promoters) are known
3
. Starting with such a system, design
choices cannot be cleanly implemented without triggering a web of sec-
ondary effects. For example, a desired change in gene order may be toler-
able in itself, but if there are promoters internal to the ORFs, this could
create transcriptional interference. Overlapping genetic elements also
thwart part substitutions; if genes are translationally coupled, this com-
plicates codon optimization or the substitution of a ribosome binding site
(RBS), where these will have a secondary impact on neighboring genes.
Refactoring is an engineering approach to clean up a natural genetic
system
5
. The goal is to create a fully defined and modular system
by systematically eliminating native regulation and replacing it with
well-characterized parts
6
. First, refactoring removes complex, multi-
gene pathways from the control of the host and places them under
the control of synthetic genetic sensors and circuits. This eliminates
the influence of the many environmental and cellular inputs that can
influence a system and enables it to be controlled with an inducible
switch or as the output of a circuit
7
. Second, refactoring facilitates
the large-scale part swapping and engineering that is required for
species transfer. Each species speaks a different regulatory language,
and refactoring simplifies the conversion of the code from one to
another (codon optimizing each gene, converting ribosome binding
sites and so on).
Here, we start with a refactored version of the nif gene cluster
that encodes the enzymes necessary for nitrogenase activity from
Klebsiella oxytoca
6
. Nitrogen fixation is a key process in agriculture,
involving the conversion of atmospheric N
2
to ammonia, and since
the 1970s it has been a goal in biotechnology to move this function
into cereal crops to reduce the use of chemically derived fertilizer
8
. In
Klebsiella, the native cluster contains 20 genes encoded in 7 operons,
altogether comprising 25 kb and encoding regulatory proteins, the
nitrogenase enzyme, chaperones, electron transport proteins and the
biosynthetic pathway for the iron-molybdenum cofactor (FeMo-co)
and other metalloclusters
9
. Under appropriate environmental condi-
tions, the cluster is highly expressed (nifH alone accounts for 10%
of cell weight), and activity is balanced to avoid H
2
generation as
Functional optimization of gene clusters by
combinatorial design and assembly
Michael J Smanski
1
, Swapnil Bhatia
2
, Dehua Zhao
1
, YongJin Park
1
, Lauren B A Woodruff
1,3
, Georgia Giannoukos
3
,
Dawn Ciulla
3
, Michele Busby
3
, Johnathan Calderon
1
, Robert Nicol
3
, D Benjamin Gordon
1,3
, Douglas Densmore
2
&
Christopher A Voigt
1,3
Large microbial gene clusters encode useful functions, including energy utilization and natural product biosynthesis, but genetic
manipulation of such systems is slow, difficult and complicated by complex regulation. We exploit the modularity of a refactored
Klebsiella oxytoca nitrogen fixation (nif) gene cluster (16 genes, 103 parts) to build genetic permutations that could not be
achieved by starting from the wild-type cluster. Constraint-based combinatorial design and DNA assembly are used to build
libraries of radically different cluster architectures by varying part choice, gene order, gene orientation and operon occupancy.
We construct 84 variants of the nifUSVWZM operon, 145 variants of the nifHDKY operon, 155 variants of the nifHDKYENJ
operon and 122 variants of the complete 16-gene pathway. The performance and behavior of these variants are characterized
by nitrogenase assay and strand-specific RNA sequencing (RNA-seq), and the results are incorporated into subsequent design
cycles. We have produced a fully synthetic cluster that recovers 57% of wild-type activity. Our approach allows the performance
of genetic parts to be quantified simultaneously in hundreds of genetic contexts. This parallelized design-build-test-learn cycle,
which can access previously unattainable regions of genetic space, should provide a useful, fast tool for genetic optimization and
hypothesis testing.
1
Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
2
Electrical and
Computer Engineering Department, Boston University, Boston, Massachusetts, USA.
3
Broad Technology Labs, Broad Institute of MIT and Harvard, Cambridge,
Massachusetts, USA. Correspondence should be addressed to C.A.V. (cavoigt@gmail.com).
Received 14 March; accepted 7 October; published online 24 November 2014; doi:10.1038/nbt.3063