Progress, challenges and the future of crop genomes Todd P Michael 1 and Robert VanBuren 2 The availability of plant reference genomes has ushered in a new era of crop genomics. More than 100 plant genomes have been sequenced since 2000, 63% of which are crop species. These genome sequences provide insight into architecture, evolution and novel aspects of crop genomes such as the retention of key agronomic traits after whole genome duplication events. Some crops have very large, polyploid, repeat-rich genomes, which require innovative strategies for sequencing, assembly and analysis. Even low quality reference genomes have the potential to improve crop germplasm through genome-wide molecular markers, which decrease expensive phenotyping and breeding cycles. The next stage of plant genomics will require draft genome refinement, building resources for crop wild relatives, resequencing broad diversity panels, and plant ENCODE projects to better understand the complexities of these highly diverse genomes. Addresses 1 Ibis Biosciences, Carlsbad, CA, United States 2 Donald Danforth Plant Science Center, St. Louis, MO, United States Corresponding author: VanBuren, Robert (bob.vanburen@gmail.com) Current Opinion in Plant Biology 2015, 24:71–81 This review comes from a themed issue on Genome studies and molecular genetics Edited by Insuk Lee and Todd C Mockler http://dx.doi.org/10.1016/j.pbi.2015.02.002 1369-5266/# 2015 Elsevier Ltd. All rights reserved. Introduction After the release of the Arabidopsis genome in 2000 [1] and the advent of Next Generation Sequencing (NGS) technology in 2005, the number of sequenced plant genomes has rapidly increased to more than 100 ([2], List of sequenced plant genomes; URL: https:// genomevolution.org/wiki/index.php/Sequenced_plant_ genomes). Nearly two-thirds (63%) of the sequenced plant genomes are from crops, while model, non-model and crop wild relatives make up the remainder; three- fourths (76%) of the sequenced plant genomes are from eudicots and one-fifth (19%) are from monocots. Few genomes from non-flowering plants have been published thus far, with only three from the Gymnospermae, one from the Bryophyta and one from the Lycopodiophyta (Figure 1, Table 1). The high throughput and low cost of NGS technologies made it possible to sequence crops with lower economic value or large genomes and have paved the way for establishing new model species. The complexity and size of some crop genomes made traditional Sanger sequenc- ing cost prohibitive. The wheat genome for instance, is hexaploid, 90% repetitive, and 17 gigabases (Gb), and the sugarcane genome ranges in ploidy up to decaploid, and its 12 Gb is 80% repetitive. Although sequencing capacity and computational power are increasing exponentially, numerous challenges still remain, and both novel meth- odologies and legacy techniques are important to crack these impossible genomes. Model plant genomes such as Arabidopsis [1], Brachypo- dium distachyon [3], Physcomitrella patens (moss [4]) and Setaria italica [5,6], serve as an engine for research, while others like Oyrza sativa (rice [7,8]), Populus trichocarpa ([9] poplar), Zea mays (maize [10]), Glycine max (soybean [11]), Solanum lycopersicum (tomato [12]), and Pinus taeda (lob- lolly pine [13 ]) serve a dual purpose not just as crops but as functional models. Together these genomes have provided the foundation for an era of molecular genomics research that has enabled functional definition of many key genes and pathways. Non-model and non-crop plant genomes provide im- portant clues to plant genome architecture and the evolution of flowering plants. Although it was thought that plants have a ‘one-way ticket to genome obesity’ as a result of the retention of proliferating transposable elements (TEs) [14], the smallest plant genomes [15], Utricularia gibba (bladderwort) and Genlisea aurea (cork- screw), provided evidence that almost all intragenic space and repeat sequence can be purged [16,17 ]. In addition, the aquatic, highly morphologically reduced, non-grass monocot Spirodela polyrhiza (greater duck- weed), has a genome similar in size to Arabidopsis yet functions with 28% less genes (19,623) [18]. The genomes of Selaginella moellendorffii (spikemoss [19]) and Amborella trichopoda [20 ], provide the evolutionary link between vascular plants and angiosperms respec- tively, yielding key insights into the trajectory of plant specific gene families and the radiance of flowering plants. In this review we focus primarily on the most recently sequenced specialty and row crop genomes with an emphasis on challenges and limitations of current genome sequencing techniques. This segues into downstream work aimed at linking the genome to the biology, and concludes with the future of plant genomics. Available online at www.sciencedirect.com ScienceDirect www.sciencedirect.com Current Opinion in Plant Biology 2015, 24:71–81