Genome-Wide Identification of Transcriptional Start Sites in the Plant Pathogen Pseudomonas syringae pv. tomato str. DC3000 Melanie J. Filiatrault 1,2 *, Paul V. Stodghill 1,2 , Christopher R. Myers 3,4 , Philip A. Bronstein 1,2¤ , Bronwyn G. Butcher 2 , Hanh Lam 2 , George Grills 3 , Peter Schweitzer 3 , Wei Wang 3 , David J. Schneider 1,2 , Samuel W. Cartinhour 1,2 1 lant-Microbe Interactions Research Unit, Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, United States Department of Agriculture, Ithaca, New York, United States of America, 2 Department of Plant Pathology and Plant-Microbe Biology, Cornell University, Ithaca, New York, United States of America, 3 Life Sciences Core Laboratories Center, Cornell University, Ithaca, New York, United States of America, 4 Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, New York, United States of America Abstract RNA-Seq has provided valuable insights into global gene expression in a wide variety of organisms. Using a modified RNA- Seq approach and Illumina’s high-throughput sequencing technology, we globally identified 59-ends of transcripts for the plant pathogen Pseudomonas syringae pv. tomato str. DC3000. A substantial fraction of 59-ends obtained by this method were consistent with results obtained using global RNA-Seq and 59RACE. As expected, many 59-ends were positioned a short distance upstream of annotated genes. We also captured 59-ends within intergenic regions, providing evidence for the expression of un-annotated genes and non-coding RNAs, and detected numerous examples of antisense transcription, suggesting additional levels of complexity in gene regulation in DC3000. Importantly, targeted searches for sequence patterns in the vicinity of 59-ends revealed over 1200 putative promoters and other regulatory motifs, establishing a broad foundation for future investigations of regulation at the genomic and single gene levels. Citation: Filiatrault MJ, Stodghill PV, Myers CR, Bronstein PA, Butcher BG, et al. (2011) Genome-Wide Identification of Transcriptional Start Sites in the Plant Pathogen Pseudomonas syringae pv. tomato str. DC3000. PLoS ONE 6(12): e29335. doi:10.1371/journal.pone.0029335 Editor: Szabolcs Semsey, Niels Bohr Institute, Denmark Received July 13, 2011; Accepted November 25, 2011; Published December 28, 2011 This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication. Funding: This work was supported by United States Department of Agriculture-Agricultural Research Service under CRIS project #1907-21000-027-00D. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: Melanie.filiatrault@ars.usda.gov ¤ Current address: United States Department of Agriculture-Food Safety and Inspection Service, Washington, D.C., United States of America Introduction Pseudomonas syringae pv. tomato strain DC3000 (DC3000) is a phytopathogen of tomato and Arabidopsis and is the focus of many molecular plant-microbe interaction studies. Sequencing and annotation of the DC3000 genome and its two plasmids was completed by The Institute for Genome Research (TIGR) in 2003 [1]. Since then a number of genomic studies have revealed important details regarding conservation and distribution of Type III effectors, potential virulence factors, and the phylogenetic scope of P. syringae [2]. Although genome sequencing has provided a wealth of knowledge, the primary genome sequence represents only the first stage in understanding complex cellular processes that are critical to the survival of plant pathogens, such as sensing and responding to environmental signals. Since these behaviors rely on the coordinated expression of genome content, it is necessary to examine the transcriptome, proteome, and metabo- lome in detail. RNA-Seq has emerged as a high-throughput strategy to analyze bacterial transcriptomes on a global scale (see reviews: [3–8]). This deep-sequencing approach has uncovered complex transcriptional activity, provided high-throughput validation of gene predictions, and efficiently revealed regulatory non-coding RNAs (ncRNAs), transcriptional start sites (TSSs), and antisense transcription in a number of bacteria. RNA-Seq protocols have been developed to target the 59 region of transcripts, allowing the identification of larger numbers of putative transcriptional start sites and aiding in defining operons [9–11]. In addition, these approaches have detected and confirmed antisense activity [12]. Recently, a more efficient version of this strategy was employed to evaluate the primary transcripts of several human pathogens, C. trachomatis [13], C. pneumoniae [14] and H. pylori [15] and the cyanobacteria, Synechocystis [16]. These modified protocols exploit an enzyme that preferentially digests processed transcripts, and discriminates between primary and processed transcripts, resulting in a data set enriched for transcriptional start sites. RNA-Seq has also been used to evaluate the transcriptome of DC3000 on a global scale [17]. This study generated valuable data concerning gene expression and provided important information that has been used for reannotation of the DC3000 genome [17]. However, very few confirmed transcriptional start sites have been reported for this pathogen. Furthermore, promoter models are available for only two sigma factors, HrpL and PvdS [18,19]. Because the identification of promoters is critical to understanding gene regulation, we devised a protocol using Illumina’s high- throughput sequencing strategy to experimentally determine PLoS ONE | www.plosone.org 1 December 2011 | Volume 6 | Issue 12 | e29335