Genetic Epidemiology 36 : 882–889 (2012) Power of IRT in GWAS: Successful QTL Mapping of Sum Score Phenotypes Depends on Interplay Between Risk Allele Frequency, Variance Explained by the Risk Allele, and Test Characteristics St´ ephanie M. van den Berg 1 * and Susan K. Service 2 1 Department of Research Methodology, Measurement and Data Analysis, University of Twente, The Netherlands 2 Center for Neurobehavioral Genetics, University of California, Los Angeles, California As data from sequencing studies in humans accumulate, rare genetic variants influencing liability to disease and disorders are expected to be identified. Three simulation studies show that characteristics and properties of diagnostic instruments interact with risk allele frequency to affect the power to detect a quantitative trait locus (QTL) based on a test score derived from symptom counts or questionnaire items. Clinical tests, that is, tests that show a positively skewed phenotypic sum score distribution in the general population, are optimal to find rare risk alleles of large effect. Tests that show a negatively skewed sum score distribution are optimal to find rare protective alleles of large effect. For alleles of small effect, tests with normally distributed item parameters give best power for a wide range of allele frequencies. The item-response theory framework can help understand why an existing measurement instrument has more power to detect risk alleles with either low or high frequency, or both kinds. Genet. Epidemiol. 36:882–889, 2012. C 2012 Wiley Periodicals, Inc. Key words: item-response theory (IRT); measurement; statistical power; extreme samples design; case-control design; population sample design Correspondence to: St´ ephanie M. van den Berg, Department of Research Methodology, Faculty of Behavioral Sciences, University of Twente, Measurement and Data Analysis (OMD), P.O. Box 217, 7500 AE Enschede, The Netherlands. E-mail: stephanie.vandenberg@utwente.nl Received 13 July 2012; Revised 13 July 2012; Accepted 3 August 2012 Published online 10 September 2012 in Wiley Online Library (wileyonlinelibrary.com/journal/gepi). DOI: 10.1002/gepi.21680 INTRODUCTION Many diagnostic instruments for a disorder consist of symptom counts. Often the disorder can be seen as the ex- treme tail of a continuous liability trait: the higher the li- ability, the more likely a subject shows certain symptoms. High-liability persons will have many symptoms and on the basis of a diagnostic criterion are then labeled as affected with the disorder of interest. In some instances, genome- wide association (GWA) studies are applied on symptom count data, rather than diagnosis, as this allows using more information [Van der Sluis et al., 2012]. Item-response theory (IRT) provides a formal statistical framework for modeling liability and diagnosis, and its ap- plication in the medical sciences is increasing [Reise and Waller, 2009]. IRT has also been successfully applied in ge- netics [Eaves et al., 2005; Van den Berg et al., 2007, 2010; Van Leeuwen et al., 2008]. It provides a useful framework for understanding the relationship between measurement problems and problems in detecting genetic variants. For example, using this IRT framework, Van der Sluis et al. [2010] showed that ignored multidimensionality, measure- ment bias, and poor reliability can result in poor statistical power in QTL-mapping studies. Here, we show how power is associated with test char- acteristics using different study designs, and how this is association is moderated by allele frequency. We link the simulation results to the IRT concept of “test information.” We start out with a brief introduction to IRT and so-called test information functions (TIFs). Next, we describe how this framework makes predictions about statistical power in QTL mapping. Three simulation studies demonstrate the in- tricate relationship between study design, allele frequency, and the TIF. ITEM-RESPONSE THEORY IRT models item data as a function of both item char- acteristics as well as person characteristics [Embretson and Reise, 2000; Lord, 1980; Lord and Novick, 1968]. An item can be anything from a symptom that is scored in a diagnostic interview as being either present or absent, or an item on a self-report questionnaire that can be answered with yes or no. Items do not have to be dichotomous (i.e., yes/no, or 1/0), but for clarity of exposition, we focus on dichotomous items in our descriptions. In the Discussion, we expand on alternative data types. The one-parameter logistic IRT model for dichotomous items, or so-called Rasch model, is P ( X ij = 1 i , j ) = 1 1 + exp ( j i ) (1) where P(X ij = 1) is the probability of a positive response for person i on item j (or the presence of symptom j). Parameter i is the person parameter for person i and can be thought of C 2012 Wiley Periodicals, Inc.