Genetic Epidemiology (2011) Uncovering the Total Heritability Explained by All True Susceptibility Variants in a Genome-Wide Association Study Hon-Cheong So, 1 Miaoxin Li, 1 and Pak C. Sham 1–3Ã 1 Department of Psychiatry, University of Hong Kong, Hong Kong SAR, China 2 Genome Research Centre, University of Hong Kong, Hong Kong SAR, China 3 The State Key Laboratory of Brain and Cognitive Sciences, University of Hong Kong, Hong Kong SAR, China Genome-wide association studies (GWAS) have become increasingly popular recently and contributed to the discovery of many susceptibility variants. However, a large proportion of the heritability still remained unexplained. This observation raises queries regarding the ability of GWAS to uncover the genetic basis of complex diseases. In this study, we propose a simple and fast statistical framework to estimate the total heritability explained by all true susceptibility variants in a GWAS. It is expected that many true risk variants will not be detected in a GWAS due to limited power. The proposed framework aims at recovering the ‘‘hidden’’ heritability. Importantly, only the summary z-statistics are required as input and no raw genotype data are needed. The strategy is to recover the true effect sizes from the observed z-statistics. The methodology does not rely on any distributional assumptions of the effect sizes of variants. Both binary and quantitative traits can be handled and covariates may be included. Population-based or family-based designs are allowed as long as the summary statistics are available. Simulations were conducted and showed satisfactory performance of the proposed approach. Application to real data (Crohn’s disease, HDL, LDL, and triglycerides) reveals that at least around 10–20% of variance in liability or phenotype can be explained by GWAS panels. This translates to around 10–40% of the total heritability for the studied traits. Genet. Epidemiol. 2011. r 2011 Wiley-Liss, Inc. Key words: association study; genetic architecture; common variants Additional Supporting Information may be found in the online version of the article. Contract grant sponsor: Hong Kong Research Grants Council General Research Fund; Contract grant numbers: HKU 766906M; HKU 774707M; Contract grant sponsor: University of Hong Kong Strategic Research Theme of Genomics. Ã Correspondence to: Pak C. Sham, Department of Psychiatry, 10/F Laboratory Block, LKS Faculty of Medicine, University of Hong Kong, Pokfulam, Hong Kong SAR, China. E-mail: pcsham@hku.hk Received 12 August 2010; Revised 5 April 2011; Accepted 14 April 2011 Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/gepi.20593 INTRODUCTION In the past few years genome-wide association studies (GWAS) have become increasingly popular and the approach has identified robust associations of many common genetic variants with more than 80 diseases [Manolio et al., 2008]. However, it has also been noted that the established common variants from GWAS generally account only for a small proportion of the total heritability [Manolio et al., 2009]. For example, more than 40 loci have been identified for height, but they explain only less than 5% of the variance in total [Visscher, 2008]. The problem of ‘‘missing heritability’’ has raised a lot of discussions and it has been postulated that, for example, rare variants and structural variations may be responsible for some of the ‘‘missing heritability’’ [Manolio et al., 2009]. The pheno- menon of missing heritability urges one to question the ability of GWAS to uncover the genetic basis of diseases. It is natural to ask: what is the maximum heritability (i.e. variance in underlying liability or phenotype) that can be explained by all true susceptibility variants in a GWAS? To put it in another way, assume the sample size is unlimited. We will thus be able to detect all the disease susceptibility variants regardless of how small their effects are. What will be the total heritability explained by then? The above issue is rarely addressed, with the exceptions of a study by the International Schizophrenia Consortium (ISC) [The International Schizophrenia Consortium, 2009] and another very recent study by Yang et al. [2010] published at the time when this paper is submitted. In the ISC study, the authors performed a GWAS on schizophrenia, and proposed a ‘‘polygenic score’’ analysis based on the loci associated at different P-value thresh- olds, up to P 5 0.5. The score analysis was performed on a pruned set of markers in approximate linkage equilibrium. The score was constructed based on the discovery sample and applied to the target sample to see if the score could significantly predict the disease status. Using males and females as discovery and target samples, they observed the score from the discovery sample was significantly asso- ciated with the outcome in the target sample, and the effect was more prominent for higher P-value thresholds up to 0.5. The total proportion of variance explained by all risk alleles was then estimated by simulations. Briefly, they repeated the entire process above but on simulated discovery and target samples under different genetic models. They determined which genetic models are r 2011 Wiley-Liss, Inc.