ARTICLE Pooled Association Tests for Rare Variants in Exon-Resequencing Studies Alkes L. Price, 1,2,3,6 Gregory V. Kryukov, 3,4,6 Paul I.W. de Bakker, 3,4 Shaun M. Purcell, 3,5 Jeff Staples, 3,4 Lee-Jen Wei, 2 and Shamil R. Sunyaev 3,4, * Deep sequencing will soon generate comprehensive sequence information in large disease samples. Although the power to detect asso- ciation with an individual rare variant is limited, pooling variants by gene or pathway into a composite test provides an alternative strategy for identifying susceptibility genes. We describe a statistical method for detecting association of multiple rare variants in protein-coding genes with a quantitative or dichotomous trait. The approach is based on the regression of phenotypic values on indi- viduals’ genotype scores subject to a variable allele-frequency threshold, incorporating computational predictions of the functional effects of missense variants. Statistical significance is assessed by permutation testing with variable thresholds. We used a rigorous pop- ulation-genetics simulation framework to evaluate the power of the method, and we applied the method to empirical sequencing data from three disease studies. Introduction GWAS have successfully identified hundreds of loci harboring common variants that are reproducibly associ- ated with complex traits. However, common variants iden- tified to date typically explain only a small fraction of over- all heritability, motivating interest in low-frequency or rare variants that may contribute to genetic risk. 1,2 Technolog- ical advances in high-throughput sequencing platforms will soon make it possible to extend association studies to low-frequency and rare variants, particularly in targeted re- sequencing of exons. 3,4 Rare variants are predicted to be en- riched for functional alleles and to exhibit stronger effect sizes than common variants, consistent with the view that functional allelic variants are subject to purifying selec- tion pressure. 5–7 Deep-resequencing studies of candidate genes have already demonstrated the effect of rare alleles on several complex traits of biomedical relevance. 8–14 The statistical power to detect phenotypic association with an individual rare variant is limited, due to the small number of observations for any given variant and a more stringent multiple-test correction as compared to common variants. This motivates analytical approaches that test the combined effect of multiple rare variants, but this requires prior specification of which variants to combine into the test. To date, most candidate-gene resequencing studies have compared the number of individuals carrying alleles exclusive to either of the phenotypic extremes. This strategy effectively eliminates common alleles from the test because they would be present in individuals at both extremes unless they have enormous effect. For large sample sizes, however, limiting the association analysis to exclusive alleles may unnecessarily reduce the statistical power of the test. A recently proposed approach is to pick a fixed allele- frequency threshold and perform an association test on the set of variants below that threshold, giving them each equal weight (more generally, variants can be collapsed into multiple frequency bins). 15 Another approach is to weight counts of each variant on the basis of the estimated variance under the null hypothesis of no association. 16 This scheme applies much higher weights to very rare variants, and it implicitly assumes that the log odds ratio is approximately inversely propor- tional to the square root of the allele frequency, as we show below. Using population-genetics simulations informed by empirical sequencing data, we analyzed the relationship between the phenotypic effect and the allele frequency of a mutation within an evolutionary model that incorpo- rates purifying selection. These simulations highlighted the potential value of a statistical approach that uses a vari- able allele-frequency threshold instead of a fixed threshold. We have implemented such an approach, as- sessing statistical significance by permutation testing with variable thresholds, and we show that this approach indeed improves statistical power in both simulated and empirical data sets. In particular, this approach does not make implicit assumptions about the relationship between allele frequency and odds ratio. Next, we have incorporated computational predictions of the functional effect of amino acid changes 17,18 in the statistical test. The test gives higher weight to allelic vari- ants predicted to be functionally significant and lower weight to variants predicted to be functionally insignifi- cant. We show that incorporating computational predic- tions of functional importance further boosts power. 1 Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA; 2 Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; 3 Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; 4 Division of Genetics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA; 5 Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA 6 These authors contributed equally to this work *Correspondence: ssunyaev@rics.bwh.harvard.edu DOI 10.1016/j.ajhg.2010.04.005. ª2010 by The American Society of Human Genetics. All rights reserved. 832 The American Journal of Human Genetics 86, 832–838, June 11, 2010