Feature Review Tools and Strategies for Long-Read Sequencing and De Novo Assembly of Plant Genomes Hyungtaek Jung, 1, * Christopher Wineeld, 2 Aureliano Bombarely, 3,4 Peter Prentis, 5 and Peter Waterhouse 1,6, * The commercial release of third-generation sequencing technologies (TGSTs), giving long and ultra-long sequencing reads, has stimulated the development of new tools for assembling highly contiguous genome sequences with unprec- edented accuracy across complex repeat regions. We survey here a wide range of emerging sequencing platforms and analytical tools for de novo assembly, provide background information for each of their steps, and discuss the spec- trum of available options. Our decision tree recommends workows for the generation of a high-quality genome assembly when used in combination with the specic needs and resources of a project. Challenges and Progress with Plant Genomics A genome assembly is simply the sequence produced after all of the chromosomes of a target species have been fragmented (a large number of short/long DNA sequences), sequenced, and computationally put back together again to create a representation of the original intact chromosome sequences. De novo genome assembly assumes no prior knowledge of the source DNA sequence length, layout, or composition. The usual aim of a genome assembly is to build a highly accurate contiguous (i.e., an uninterrupted stretch of overlapping DNA) consensus sequence representing a haploid-phase version of the genome (one for each parental haplotype) of the target species. The costs of acquiring sufcient sequence data for such an assembly have now dropped to a level that most laboratories can afford. This has led to the recent explosion of plant species being sequenced. Four questions must be considered when embarking on a new genome assembly project are: (i) how big is the genome?; (ii) is it a diploid, polyploid, and/or highly heterozygous hybrid species?; (iii) how much repetitive sequence is likely to be present in the genome; and (iv) what is the best experimental and computational design to be employed? Most large plant genomes have high levels of repeated and duplicated sequences owing to whole-genome, chromosomal, subchromosomal, or tandem duplications (e.g., transposable element activity) [1,2]. With genome assemblies based on short-read (75700 bp) data, the repeats and duplications are often not well resolved, leading to the bioinformatic formation of chimeric sequences (see Glossary) and fragmented contigs. Third-generation sequencing platforms (Pacic Biosciences, PacBio and Oxford Nanopore Technologies, ONT), that generate individual read-lengths from 8 kb to 40 kb (maximum N150 kb for PacBio and N2 Mb for ONT) [3], give much better resolution and contiguity. Nevertheless, some regions of a genome, such as the telomeric and centromeric regions of chromosomes, are often poorly resolved because they can contain megabases of repeated sequences. Current bioinformatic software does not cope well with these difcult regions, especially in the complex and polyploid genomes of many Highlights Tumbling sequencing costs, improve- ments in bioinformatic pipelines, and increased access to high-performance computing capabilities have resulted in a perfect storm where nonspecialist genomics research groups are able to access, deploy, and generate de novo genome sequences in nonmodel plant systems. However, generating a high-quality as- sembly for many plant species still pre- sents signicant challenges owing to genome size, complexity, and experi- mental and computational design. Selecting the most appropriate se- quencing and software platforms for a new genome project can be confusing and daunting because of the wide spectrum of available options and the performance quality of specic tools in different contexts. 1 Centre for Tropical Crops and Biocommodities, Queensland University of Technology, Brisbane, QLD 4001, Australia 2 Department of Wine, Food, and Molecular Biosciences, Lincoln University, 7647 Christchurch, New Zealand 3 Department of Bioscience, University of Milan, Milan 20133, Italy 4 School of Plants and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA 5 School of Earth, Environmental, and Biological Sciences, Queensland University of Technology, Brisbane, QLD, 4001, Australia 6 School of Biological Sciences, University of Sydney, Sydney, NSW 2006, Australia Trends in Plant Science, Month 2019, Vol. xx, No. xx https://doi.org/10.1016/j.tplants.2019.05.003 1 © 2019 Elsevier Ltd. All rights reserved. Trends in Plant Science TRPLSC 1813 No. of Pages 25