Genetic Epidemiology RESEARCH ARTICLE Longitudinal Data Analysis for Genetic Studies in the Whole-Genome Sequencing Era Zheyang Wu, 1 Yijuan Hu, 2 and Phillip E. Melton 3 1 Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, Massachusetts, United States of America; 2 Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America; 3 Centre for Genetic Origins of Health and Disease, University of Western Australia, Crawley, Australia Published online in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/gepi.21829 ABSTRACT: The analysis of whole-genome sequence (WGS) data using longitudinal phenotypes offers a potentially rich resource for the examination of the genetic variants and their covariates that affect complex phenotypes over time. We summarize eight contributions to the Genetic Analysis Workshop 18, which applied a diverse array of statistical genetic methods to analyze WGS data in combination with data from genome-wide association studies (GWAS) from up to four different time points on blood pressure phenotypes. The common goal of these analyses was to develop and apply appropriate methods that utilize longitudinal repeated measures to potentially increase the analytic efficiency of WGS and GWAS data. These diverse methods can be grouped into two categories, based on the way they model dependence structures: (1) linear mixed-effects (LME) models, where the random effect terms in the linear models are used to capture the dependence structures; and (2) variance-components models, where the dependence structures are constructed directly based on multiple components of variance-covariance matrices for the multivariate Gaussian responses. Despite the heterogeneous nature of these analytical methods, the group came to the following conclusions: (1) the use of repeat measurements can gain power to identify variants associated with the phenotype; (2) the inclusion of family data may correct genotyping errors and allow for more accurate detection of rare variants than using unrelated individuals only; and (3) fitting mixed-effects and variance-components models for longitudinal data presents computational challenges. The challenges and computational burden demanded by WGS data were addressed in the eight contributions. Genet Epidemiol 38:S74–S80, 2014. © 2014 Wiley Periodicals, Inc. KEY WORDS: rare variants; longitudinal data; repeat measurements; whole-genome sequencing; family studies Introduction Traditional analyses of genetic variants that influence com- plex traits focus on phenotypes and covariate measurements from a single time point (i.e., cross-sectional study). Although genetic variants are essentially fixed, the quantitative disease traits and their associated risk factors mostly vary with time. Recent genetic association studies have been performed on longitudinal cohorts to take advantage of repeat measure- ments of time-varying variables [Das et al., 2011; Fan et al., 2012; Furlotte et al., 2012]. Longitudinal analysis in genetic studies offers several advantages. First, repeat measurements can reduce type I error and thus increase statistical power compared to a single measurement. This is appealing for ge- netic studies because it requires no additional genotyping. Second, longitudinal data provide opportunities to identify genetic determinants for age of onset and subsequent pro- gression of complex traits. Finally, longitudinal studies per- mit the prospective measurement of time-varying covariates that are not typically included in traditional genetic stud- The authors contributed equally to this article. Correspondence to: Phillip E. Melton, Centre for Genetic Origins of Health and Disease, University of Western Australia, 35 Stirling Hwy (M409), Crawley, WA 6009, Australia. E-mail: phillip.melton@uwa.edu.au ies. Analysis of longitudinal data in genetic studies also faces two main challenges. The first challenge is the additional statistical consideration in accounting for the within-family correlation, because many genetic studies are family based. The second challenge is computational; advanced statistical methods developed for epidemiological studies, including generalized estimating equations and mixed models, that ac- count for large pedigree structure may not be scalable to whole-genome sequence (WGS) data. Implementation of longitudinal data analysis in genetic studies has increased recently in order to exploit the addi- tional information and to increase statistical power [Smith et al., 2010]. Participants in two previous Genetic Analysis Workshops (GAWs), GAW13 and GAW16, applied statistical genetic methods to observed and simulated longitudinal phe- notype data based on the Framingham Heart Study, which prompted development of a number of analytical strategies [Gauderman et al., 2003; Kerner et al., 2009]. However, these strategies focused primarily on linkage or association analysis of common variants. The implementations are limited in the context of WGS studies that focus on rare variant identifi- cation. The GAW18 working group on analysis of longitudi- nal data using sequencing and GWAS includes eight contri- butions that used WGS data or datasets from WGS studies C 2014 WILEY PERIODICALS, INC.