Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments Christopher S. Miller* ¤a , Kim M. Handley ¤b¤c , Kelly C. Wrighton, Kyle R. Frischkorn, Brian C. Thomas, Jillian F. Banfield Department of Earth and Planetary Science, University of California, Berkeley, California, United States of America Abstract In microbial ecology, a fundamental question relates to how community diversity and composition change in response to perturbation. Most studies have had limited ability to deeply sample community structure (e.g. Sanger-sequenced 16S rRNA libraries), or have had limited taxonomic resolution (e.g. studies based on 16S rRNA hypervariable region sequencing). Here, we combine the higher taxonomic resolution of near-full-length 16S rRNA gene amplicons with the economics and sensitivity of short-read sequencing to assay the abundance and identity of organisms that represent as little as 0.01% of sediment bacterial communities. We used a new version of EMIRGE optimized for large data size to reconstruct near-full- length 16S rRNA genes from amplicons sheared and sequenced with Illumina technology. The approach allowed us to differentiate the community composition among samples acquired before perturbation, after acetate amendment shifted the predominant metabolism to iron reduction, and once sulfate reduction began. Results were highly reproducible across technical replicates, and identified specific taxa that responded to the perturbation. All samples contain very high alpha diversity and abundant organisms from phyla without cultivated representatives. Surprisingly, at the time points measured, there was no strong loss of evenness, despite the selective pressure of acetate amendment and change in the terminal electron accepting process. However, community membership was altered significantly. The method allows for sensitive, accurate profiling of the ‘‘long tail’’ of low abundance organisms that exist in many microbial communities, and can resolve population dynamics in response to environmental change. Citation: Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, et al. (2013) Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments. PLoS ONE 8(2): e56018. doi:10.1371/journal.pone.0056018 Editor: Jack Anthony Gilbert, Argonne National Laboratory, United States of America Received December 3, 2012; Accepted January 9, 2013; Published February 6, 2013 Copyright: ß 2013 Miller et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Funding was provided by the IFRC, Subsurface Biogeochemical Research Program and the Knowledgebase Program (DE-AC02-05CH11231), Office of Science, Biological and Environmental Research, US Department of Energy (DOE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: christopher.s.miller@ucdenver.edu ¤a Current address: Department of Integrative Biology, University of Colorado Denver, Denver, Colorado, United States of America ¤b Current address: Computation Institute, University of Chicago, Chicago, Illinois, United States of America ¤c Current address: Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, Illinois, United States of America Introduction Microbial communities respond to, and effect change on, surrounding geochemical conditions. Advances in community proteogenomics and transcriptomics have allowed for understand- ing the molecular basis of this interplay for some communities of interest [1–5]. However, most inferences of microbe-environment interactions are still made with molecular surveys of community- wide taxonomic affiliation. For many years, the phylogenetic marker gene of choice for such surveys has been the small subunit (SSU) ribosomal rRNA gene, due to its high conservation across the domains of life and the ability to PCR-amplify the sequences from complex communities with so-called ‘‘universal’’ conserved primers [6,7]. Currently, both the SILVA and Greengenes SSU databases contain nearly half a million high-quality sequences that can be used to place genes from newly characterized communities in context [8,9]. While tens to thousands of full-length rRNA gene sequences are collected via Sanger sequencing of cloned PCR products, hundreds of thousands to millions of short hypervariable fragments from this gene can be analyzed using 454 sequencing. Early studies inferred community composition with reads of approximately 100 bp [10]. Subsequent studies used longer reads, and sometimes targeted alternative hypervariable regions [11–13]. With 454 pyrosequencing of hypervariable regions for community charac- terization, care has to be taken to distinguish novel sequences from sequence variants introduced due to the high error rate [14–16]. In recent years, many groups have exploited the scale and economics afforded by hundreds of millions of Illumina reads to survey microbial community composition [17–24]. Typically, the strategy has been one borrowed directly from the initial 454-based surveys: PCR amplify one or more hypervariable regions of the SSU gene and use the short sequenced tags to infer phylogeny. Because of the short read lengths (typically 100–150 bp) and error rate, a read quality-filtering step is usually employed prior to identification of operational taxonomic units (OTUs). Caporaso et al. observed that, in a mock community, diversity was over- estimated unless confident sequences were observed at least 10,000 times in an experiment, a level that represented $0.01% of reads. [18]. Although many groups have been able to distinguish communities using single-end reads [18,19,22], others have attempted to correct errors by choosing sequencing primers so PLOS ONE | www.plosone.org 1 February 2013 | Volume 8 | Issue 2 | e56018