Genetic
Epidemiology
RESEARCH ARTICLE
Longitudinal Data Analysis for Genetic Studies
in the Whole-Genome Sequencing Era
Zheyang Wu,
1 †
Yijuan Hu,
2 †
and Phillip E. Melton
3 ∗
1
Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, Massachusetts, United States of America;
2
Department of
Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America;
3
Centre for Genetic Origins of Health and Disease,
University of Western Australia, Crawley, Australia
Published online in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/gepi.21829
ABSTRACT: The analysis of whole-genome sequence (WGS) data using longitudinal phenotypes offers a potentially rich
resource for the examination of the genetic variants and their covariates that affect complex phenotypes over time. We
summarize eight contributions to the Genetic Analysis Workshop 18, which applied a diverse array of statistical genetic
methods to analyze WGS data in combination with data from genome-wide association studies (GWAS) from up to four
different time points on blood pressure phenotypes. The common goal of these analyses was to develop and apply appropriate
methods that utilize longitudinal repeated measures to potentially increase the analytic efficiency of WGS and GWAS data.
These diverse methods can be grouped into two categories, based on the way they model dependence structures: (1) linear
mixed-effects (LME) models, where the random effect terms in the linear models are used to capture the dependence structures;
and (2) variance-components models, where the dependence structures are constructed directly based on multiple components
of variance-covariance matrices for the multivariate Gaussian responses. Despite the heterogeneous nature of these analytical
methods, the group came to the following conclusions: (1) the use of repeat measurements can gain power to identify variants
associated with the phenotype; (2) the inclusion of family data may correct genotyping errors and allow for more accurate
detection of rare variants than using unrelated individuals only; and (3) fitting mixed-effects and variance-components models
for longitudinal data presents computational challenges. The challenges and computational burden demanded by WGS data
were addressed in the eight contributions.
Genet Epidemiol 38:S74–S80, 2014. © 2014 Wiley Periodicals, Inc.
KEY WORDS: rare variants; longitudinal data; repeat measurements; whole-genome sequencing; family studies
Introduction
Traditional analyses of genetic variants that influence com-
plex traits focus on phenotypes and covariate measurements
from a single time point (i.e., cross-sectional study). Although
genetic variants are essentially fixed, the quantitative disease
traits and their associated risk factors mostly vary with time.
Recent genetic association studies have been performed on
longitudinal cohorts to take advantage of repeat measure-
ments of time-varying variables [Das et al., 2011; Fan et al.,
2012; Furlotte et al., 2012]. Longitudinal analysis in genetic
studies offers several advantages. First, repeat measurements
can reduce type I error and thus increase statistical power
compared to a single measurement. This is appealing for ge-
netic studies because it requires no additional genotyping.
Second, longitudinal data provide opportunities to identify
genetic determinants for age of onset and subsequent pro-
gression of complex traits. Finally, longitudinal studies per-
mit the prospective measurement of time-varying covariates
that are not typically included in traditional genetic stud-
†
The authors contributed equally to this article.
∗
Correspondence to: Phillip E. Melton, Centre for Genetic Origins of Health and
Disease, University of Western Australia, 35 Stirling Hwy (M409), Crawley, WA 6009,
Australia. E-mail: phillip.melton@uwa.edu.au
ies. Analysis of longitudinal data in genetic studies also faces
two main challenges. The first challenge is the additional
statistical consideration in accounting for the within-family
correlation, because many genetic studies are family based.
The second challenge is computational; advanced statistical
methods developed for epidemiological studies, including
generalized estimating equations and mixed models, that ac-
count for large pedigree structure may not be scalable to
whole-genome sequence (WGS) data.
Implementation of longitudinal data analysis in genetic
studies has increased recently in order to exploit the addi-
tional information and to increase statistical power [Smith
et al., 2010]. Participants in two previous Genetic Analysis
Workshops (GAWs), GAW13 and GAW16, applied statistical
genetic methods to observed and simulated longitudinal phe-
notype data based on the Framingham Heart Study, which
prompted development of a number of analytical strategies
[Gauderman et al., 2003; Kerner et al., 2009]. However, these
strategies focused primarily on linkage or association analysis
of common variants. The implementations are limited in the
context of WGS studies that focus on rare variant identifi-
cation. The GAW18 working group on analysis of longitudi-
nal data using sequencing and GWAS includes eight contri-
butions that used WGS data or datasets from WGS studies
C
2014 WILEY PERIODICALS, INC.