DNA Subway – An Educational Bioinformatics Platform for Gene and Genome Analysis: DNA Barcoding, and RNA-Seq J. Williams *, † , S. McKay ‡ , M. Khalfan *, † , C. Ghiban *, † , U. Hilgert †, § , Sue Lauter *, † , Eun-Sook Jeong *, † , and D. Micklos *, † * Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, † iPlant Collaborative, T.W. Keating Bioresearch Building, ‡ Ontario Institute for Cancer Research, MaRS Centre, Toronto, ON, Canada, § BIO5 Institute, T.W. Keating Bioresearch Building, U. Arizona, Tucson, AZ ABSTRACT: DNA Subway is an educational bioinformatics platform developed by the iPlant Collaborative (NSF #DBI–0735191). DNA Subway bundles research-grade bioinformatics tools, high-performance computing, and databases into workflows with an easy-to- use interface. “Riding” DNA Subway lines, students can predict and annotate genes in up to 150kb of DNA (Red Line), identify homologs in sequenced genomes (Yellow Line), identify species using DNA barcodes and phylogenetic trees (Blue Line), and examine RNA-Seq datasets for differential transcript abundance (Green Line). With support for plant and animal genomes, DNA Subway engages students in their own learning, bringing to life key concepts in molecular biology and genetics. DNA barcoding and RNA extraction wet-lab experiments support a variety of inquiry-based learning experiences using student-generated data. Products of student research can be exported, published, and utilized in follow-up experiments. DNA Subway is freely accessible online at dnasubway.iplantcollaborative.org. Keywords: DNA barcoding, RNA-Seq, Undergraduate education Introduction High-throughput sequencing (HTS) and the related progress of computational biology have revolutionized nearly every aspect of life science investigation. However, the transition of this technology into undergraduate classrooms faces many obstacles. The sense that much of the undergraduate biology curriculum is in need of an update is summarized in the National Research Council’s BIO2010 report: “In contrast to biological research, undergraduate biology education has changed relatively little during the past two decades. The ways in which most future research biologists are educated are geared to the biology of the past, rather than to the biology of the present or future” (NRC (2003), p.1). Updating curricula in light of new technologies can be challenging given the speed at which technologies like HTS advance. Additionally, textbooks and professional development resources needed to equip educators with the knowledge, tools, and confidence to address new topics necessarily take additional time to develop. For HTS applications in particular, teaching bioinformatics using genome-scale datasets depends on resources (e.g., software, high-performance computing, and data storage) that are often a limiting factor, both in availability and expertise. Fortunately, advances in technologies and good timing have produced promising solutions to these challenges. The cost of HTS has become reasonable – more than 1000-fold reduction since 2004 (NHGRI (2013) – and the amount of data freely available for students presents real opportunities for them to contribute to a biology paradigm that operates along a continuum of research and education. This paper outlines how DNA Subway and other iPlant related resources enable educators to take advantage of these opportunities while bringing HTS to their students. Driven by educational design principles. DNA Subway was conceived to address a need not just for powerful tools, but a “classroom friendly” user interface – the lack of which is an acknowledged barrier to bioinformatics instruction (Cummings and Temple (2010)). In 2006, the iPlant Collaborative held a meeting on “Genomics in Education” at Washington University in St. Louis, at which 44 faculty identified guiding requirements to shape the development of iPlant’s educational platforms: 1) Mix lecture and lab – students have limited patience for computer work and want a wet bench “hook”; 2) Enable student-scientist partnerships – someone has to care about the data generated by students; 3) Co-investigation – projects should potentially lead to publications; and 4) Scale – platforms should support distributed projects that multiple classrooms can join. From these principles, 25 collaborators at 11 institutions aided the development of DNA Subway. To accomplish our objectives, we integrated existing open-source tools (and additional novel software to fill gaps such as viewing trace files, alignments, or file export utilities) into an approachable graphical user interface. The purpose was to make it possible for educators to prepare streamlined lessons utilizing the same tools and data that are accessible to researchers. Pipelines based on standard bioinformatics processes were progressively released for genome annotation (Red Line; launched 2010), identification of gene homologues and non-coding DNA using TARGeT (Yellow Line; 2010) (Han, Burnette, and Wessler (2009)), DNA barcoding and phylogenetics (Blue Line; 2011), and RNA-Seq analysis of transcript abundance (Green Line; beta launched 2013). DNA Subway makes more than 30 tools available through a web interface that mitigates pedagogical barriers to adoption, such as moving data (and students) from one platform to another, or using tools that