© 2014 Nature America, Inc. All rights reserved. NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION 1 ARTICLES Biological systems are able to build intricate materials and chemicals that require precise dynamic and spatial control over many genes. However, engineering large systems that are composed of many genetic parts is not straightforward. First, software is focused on combining parts at the DNA sequence level, which can make the design proc- ess time consuming. Second, it takes months to prototype a design. Although DNA synthesis is routine for individual genes 1 and, indeed, has been used to build entire 1-Mb genomes 2 , it remains too costly to simultaneously synthesize many large alternative designs. In practice, it is feasible to build only a few designs for testing, so finding a design that works can take considerable time. This is further complicated when working with large, naturally occurring systems that are the product of evolutionary forces and have redundant and often overlapping regulatory elements 3,4 . Even in well- characterized systems, not all of the regulation or regulatory parts (for example, promoters) are known 3 . Starting with such a system, design choices cannot be cleanly implemented without triggering a web of sec- ondary effects. For example, a desired change in gene order may be toler- able in itself, but if there are promoters internal to the ORFs, this could create transcriptional interference. Overlapping genetic elements also thwart part substitutions; if genes are translationally coupled, this com- plicates codon optimization or the substitution of a ribosome binding site (RBS), where these will have a secondary impact on neighboring genes. Refactoring is an engineering approach to clean up a natural genetic system 5 . The goal is to create a fully defined and modular system by systematically eliminating native regulation and replacing it with well-characterized parts 6 . First, refactoring removes complex, multi- gene pathways from the control of the host and places them under the control of synthetic genetic sensors and circuits. This eliminates the influence of the many environmental and cellular inputs that can influence a system and enables it to be controlled with an inducible switch or as the output of a circuit 7 . Second, refactoring facilitates the large-scale part swapping and engineering that is required for species transfer. Each species speaks a different regulatory language, and refactoring simplifies the conversion of the code from one to another (codon optimizing each gene, converting ribosome binding sites and so on). Here, we start with a refactored version of the nif gene cluster that encodes the enzymes necessary for nitrogenase activity from Klebsiella oxytoca 6 . Nitrogen fixation is a key process in agriculture, involving the conversion of atmospheric N 2 to ammonia, and since the 1970s it has been a goal in biotechnology to move this function into cereal crops to reduce the use of chemically derived fertilizer 8 . In Klebsiella, the native cluster contains 20 genes encoded in 7 operons, altogether comprising 25 kb and encoding regulatory proteins, the nitrogenase enzyme, chaperones, electron transport proteins and the biosynthetic pathway for the iron-molybdenum cofactor (FeMo-co) and other metalloclusters 9 . Under appropriate environmental condi- tions, the cluster is highly expressed (nifH alone accounts for 10% of cell weight), and activity is balanced to avoid H 2 generation as Functional optimization of gene clusters by combinatorial design and assembly Michael J Smanski 1 , Swapnil Bhatia 2 , Dehua Zhao 1 , YongJin Park 1 , Lauren B A Woodruff 1,3 , Georgia Giannoukos 3 , Dawn Ciulla 3 , Michele Busby 3 , Johnathan Calderon 1 , Robert Nicol 3 , D Benjamin Gordon 1,3 , Douglas Densmore 2 & Christopher A Voigt 1,3 Large microbial gene clusters encode useful functions, including energy utilization and natural product biosynthesis, but genetic manipulation of such systems is slow, difficult and complicated by complex regulation. We exploit the modularity of a refactored Klebsiella oxytoca nitrogen fixation (nif) gene cluster (16 genes, 103 parts) to build genetic permutations that could not be achieved by starting from the wild-type cluster. Constraint-based combinatorial design and DNA assembly are used to build libraries of radically different cluster architectures by varying part choice, gene order, gene orientation and operon occupancy. We construct 84 variants of the nifUSVWZM operon, 145 variants of the nifHDKY operon, 155 variants of the nifHDKYENJ operon and 122 variants of the complete 16-gene pathway. The performance and behavior of these variants are characterized by nitrogenase assay and strand-specific RNA sequencing (RNA-seq), and the results are incorporated into subsequent design cycles. We have produced a fully synthetic cluster that recovers 57% of wild-type activity. Our approach allows the performance of genetic parts to be quantified simultaneously in hundreds of genetic contexts. This parallelized design-build-test-learn cycle, which can access previously unattainable regions of genetic space, should provide a useful, fast tool for genetic optimization and hypothesis testing. 1 Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 2 Electrical and Computer Engineering Department, Boston University, Boston, Massachusetts, USA. 3 Broad Technology Labs, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA. Correspondence should be addressed to C.A.V. (cavoigt@gmail.com). Received 14 March; accepted 7 October; published online 24 November 2014; doi:10.1038/nbt.3063