Natural CMT2 Variation Is Associated With Genome-Wide Methylation Changes and Temperature Seasonality Xia Shen 1,2,3 , Jennifer De Jonge 4 , Simon K. G. Forsberg 1 , Mats E. Pettersson 1 , Zheya Sheng 1 , Lars Hennig 4 ,O ¨ rjan Carlborg 1 * 1 Swedish University of Agricultural Sciences, Department of Clinical Sciences, Division of Computational Genetics, Uppsala, Sweden, 2 Karolinska Institutet, Department of Medical Epidemiology and Biostatistics, Stockholm, Sweden, 3 University of Edinburgh, MRC Institute of Genetics and Molecular Medicine, MRC Human Genetics Unit, Edinburgh, United Kingdom, 4 Swedish University of Agricultural Sciences, Department of Plant Biology, Uppsala, Sweden Abstract As Arabidopsis thaliana has colonized a wide range of habitats across the world it is an attractive model for studying the genetic mechanisms underlying environmental adaptation. Here, we used public data from two collections of A. thaliana accessions to associate genetic variability at individual loci with differences in climates at the sampling sites. We use a novel method to screen the genome for plastic alleles that tolerate a broader climate range than the major allele. This approach reduces confounding with population structure and increases power compared to standard genome-wide association methods. Sixteen novel loci were found, including an association between Chromomethylase 2 (CMT2) and temperature seasonality where the genome-wide CHH methylation was different for the group of accessions carrying the plastic allele. Cmt2 mutants were shown to be more tolerant to heat-stress, suggesting genetic regulation of epigenetic modifications as a likely mechanism underlying natural adaptation to variable temperatures, potentially through differential allelic plasticity to temperature-stress. Citation: Shen X, De Jonge J, Forsberg SKG, Pettersson ME, Sheng Z, et al. (2014) Natural CMT2 Variation Is Associated With Genome-Wide Methylation Changes and Temperature Seasonality. PLoS Genet 10(12): e1004842. doi:10.1371/journal.pgen.1004842 Editor: Gregory P. Copenhaver, The University of North Carolina at Chapel Hill, United States of America Received April 14, 2014; Accepted October 21, 2014; Published December 11, 2014 Copyright: ß 2014 Shen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper and its Supporting Information files. Funding: This work was funded by a EURYI-award and a SSF Future Research Leader Grant to O ¨ C, a Swedish Research Council grant (537-2014-371) to XS and a FORMAS grant (to LH). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * Email: orjan.carlborg@slu.se Introduction Arabidopsis thaliana has colonized a wide range of habitats across the world and it is therefore an attractive model for studying the genetic mechanisms underlying environmental adaptation [1]. Several large collections of A. thaliana accessions have either been whole-genome re-sequenced or high-density SNP genotyped [1–7]. The included accessions have adapted to a wide range of different climatic conditions and therefore loci involved in climate adaptation will display genotype by climate-at-sampling-site correlations in these populations. Genome-wide association or selective-sweep analyses can therefore potentially identify signals of natural selection involved in environmental adaptation, if those can be disentangled from the effects of other population genetic forces acting to change the allele frequencies. Selective-sweep studies are inherently sensitive to population-structure and, if present, the false-positive rates will be high as the available statistical methods are unable to handle this situation properly. Further experimental validation of inferred sweeps (e.g. [1,8]) is hence necessary to suggest them as adaptive. In GWAS, kinship correction is now a standard approach to account for population structure that properly controls the false discovery rate. Unfortunately, correcting for genomic kinship often decreases the power to detect individual adaptive loci, which is likely the reason that no genome-wide significant associations to climate conditions were found in earlier GWAS analyses [1,8]. Neverthe- less, a number of candidate adaptive loci could despite this be identified using extensive experimental validation [1,2,8], showing how valuable these populations are as a resource for finding the genomic footprint of climate adaptation. Genome-wide association (GWA) datasets based on natural collections of A. thaliana accessions, such as the RegMap collection, are often genetically stratified. This is primarily due to the close relationships between accessions sampled at nearby locations. Furthermore, as the climate measurements used as phenotypes for the accessions are values representative for the sampling locations of the individual accessions, these measure- ments will be confounded with the general genetic relationship [9]. Unless properly controlled for, this confounding might lead to excessive false-positive signals in the association analysis; this as the differences in allele-frequencies between loci in locations that differ in climate, and at the same time are geographically distant, will create an association between the genotype and the trait. However, this association could also be due to other forces than selection. In traditional GWA analyses, mixed-model based approaches are commonly used to control for population- stratification. The downside of this approach is that it, in practice, will remove many true genetic signals coming from local adaptation due to the inherent confounding between local genotype and adaptive phenotype. Instead, the primary signals from such analyses will be due to effects of alleles that exist in, and PLOS Genetics | www.plosgenetics.org 1 December 2014 | Volume 10 | Issue 12 | e1004842