© 2011 Nature America, Inc. All rights reserved. NATURE GENETICS ADVANCE ONLINE PUBLICATION 1 ARTICLES Genomes are shaped by the interaction of diverse processes and evolutionary forces: recombination, gene conversion, mutation, selection and demography, as well as recurrent cycles of poly- ploidization and subsequent diploidization, along with hybridiza- tion and the associated processes of admixture and introgression. Disentangling the effects of these processes on sequence variation is essential not only for understanding how genetic diversity is gen- erated and maintained but also for tracking down allelic variants responsible for phenotypic variation. A. thaliana and its close rela- tives have been at the forefront of investigations of these processes in plants 1,2 . For example, both the local and global population struc- tures of A. thaliana, which reflect the species’ migration history since the Ice Age as well as the surprisingly frequent outcrossing events between the inbred strains, have been studied in consider- able detail 3,4 . The first genome-wide haplotype map of a plant was produced for this species 5 , and the information from this endeavor has already been successfully used for genome-wide association studies (GWAS) 6–9 . Despite the rapid progress in linking genotype to phenotype, a major gap remains in the ability to identify alleles that are directly responsible for variation in adaptive traits. As in humans, the complete sequencing of genomes provides an essential stepping stone toward this goal. Moreover, the recent completion of a reference genome sequence for the species’ closest relative, Arabidopsis lyrata, is informing the interpretation of polymorphism patterns in A. thaliana 10 . Exploratory efforts with a small number of strains suggested early on that short-read sequencing is an efficient means of describing whole-genome sequence variation in A. thaliana 11,12 , and on the basis of early successes, a 1001 Genomes Project for the species has been advocated 13 (see URLs for project website). Here we present results from the first major phase of the 1001 Genomes Project, an analysis of 80 strains that were chosen to represent the genetic diversity present in eight populations across the entire native range of the species. The study design supports systematic investigation of the effects of geo- graphy and demography on whole-genome sequence variation. RESULTS Sequencing of 80 A. thaliana accessions The native range of A. thaliana is in Eurasia, spanning varied climates and elevations, from the high mountains of Central Asia to the European Atlantic Coast, and from North Africa to the Arctic Circle. To enable the discovery of both global and local effects on sequence diversity, we focused on six larger geographic regions: the Iberian Peninsula with North Africa; Southern Italy; Eastern Europe; the Caucasus; Southern Russia; and Central Asia. In addition, we sampled two much smaller regions, Swabia, in the southwest of Germany, and South Tyrol, in the north of Italy (Fig. 1). From each region, we selected 7–14 naturally inbred strains, or accessions, that we had iden- tified as genetically diverse on the basis of limited genome-wide geno- typing (Fig. 1a and Supplementary Table 1). From a single individual Whole-genome sequencing of multiple Arabidopsis thaliana populations Jun Cao 1,8 , Korbinian Schneeberger 1,2,8 , Stephan Ossowski 1,3,4,8 , Torsten Günther 5,8 , Sebastian Bender 1 , Joffrey Fitz 1 , Daniel Koenig 1 , Christa Lanz 1 , Oliver Stegle 6 , Christoph Lippert 6 , Xi Wang 1 , Felix Ott 1 , Jonas Müller 1 , Carlos Alonso-Blanco 7 , Karsten Borgwardt 6 , Karl J Schmid 5 & Detlef Weigel 1 The plant Arabidopsis thaliana occurs naturally in many different habitats throughout Eurasia. As a foundation for identifying genetic variation contributing to adaptation to diverse environments, a 1001 Genomes Project to sequence geographically diverse A. thaliana strains has been initiated. Here we present the first phase of this project, based on population-scale sequencing of 80 strains drawn from eight regions throughout the species’ native range. We describe the majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome, their effects on gene function, and the patterns of local and global linkage among these variants. The action of processes other than spontaneous mutation is identified by comparing the spectrum of mutations that have accumulated since A. thaliana diverged from its closest relative 10 million years ago with the spectrum observed in the laboratory. Recent species-wide selective sweeps are rare, and potentially deleterious mutations are more common in marginal populations. 1 Max Planck Institute for Developmental Biology, Tübingen, Germany. 2 Max Planck Institute of Plant Breeding Research, Cologne, Germany. 3 Center for Genomic Regulation, Barcelona, Spain. 4 Universitat Pompeu Fabra, Barcelona, Spain. 5 Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart, Germany. 6 Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, Tübingen, Germany. 7 Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain. 8 These authors contributed equally to this work. Correspondence should be addressed to D.W. (weigel@weigelworld.org). Received 8 March; accepted 26 July; published online 28 August 2011; doi:10.1038/ng.911