American Journal of Applied Mathematics and Statistics, 2016, Vol. 4, No. 3, 59-66
Available online at http://pubs.sciepub.com/ajams/4/3/1
© Science and Education Publishing
DOI:10.12691/ajams-4-3-1
Maximum Likelihood Approach for Longitudinal
Models with Nonignorable Missing Data Mechanism
Using Fractional Imputation
Abdallah S. A. Yaseen
1
, Ahmed M. Gad
2,*
, Abeer S. Ahmed
1
1
The National Centre for Social and Criminological Research, Cairo, Egypt
2
Statistics Department, Faculty of Economics and Political Science, Cairo University, Egypt
*Corresponding author: dr_ahmedgad@yahoo.co.uk
Abstract In longitudinal studies data are collected for the same set of units for two or more occasions. This is in
contrast to cross-sectional studies where a single outcome is measured for each individual. Some intended
measurements might not be available for some units resulting in a missing data setting. When the probability of
missing depends on the missing values, missing mechanism is termed nonrandom. One common type of the missing
patterns is the dropout where the missing values never followed by an observed value. In nonrandom dropout,
missing data mechanism must be included in the analysis to get unbiased estimates. The parametric fractional
imputation method is proposed to handle the missingness problem in longitudinal studies and to get unbiased
estimates in the presence of nonrandom dropout mechanism. Also, in this setting the jackknife replication method is
used to find the standard errors for the fractionally imputed estimates. Finally, the proposed method is applied to a
real data (mastitis data) in addition to a simulation study.
Keywords: longitudinal data, mastitis data, missing data, nonrandom dropout, parametric fractional imputation,
repeated measures, standard errors
Cite This Article: Abdallah S. A. Yaseen, Ahmed M. Gad, and Abeer S. Ahmed, “Maximum Likelihood
Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using Fractional Imputation.”
American Journal of Applied Mathematics and Statistics, vol. 4, no. 3 (2016): 59-66. doi: 10.12691/ajams-4-3-1.
1. Introduction
The defining characteristic of longitudinal studies is
that sample units are measured repeatedly over time. That
is, data are collected for the same set of units for two or
more occasions. Missing values are not uncommon with
longitudinal data.
Missing data mechanisms can be classified according to
the process causing missingness, as defined by Little and
Rubin [17]. These include; missing completely at random
(MCAR), missing at random (MAR) and missing not at
random (MNAR) mechanism. Missing not at random
mechanism is always termed nonignorable missing data
mechanism. In this case the missing data mechanism must
be included in the analysis, so as to get unbiased estimates.
Another important classification is the missingness
pattern: the dropout and intermittent pattern. In dropout
pattern a subject who leaves the study at some time point
does not appear again; a missing value never followed by
an observed value, whereas in intermittent pattern a
missing value may be followed by an observed value.
Handling missing data requires jointly modeling the
longitudinal outcome and the missing data process. There
are many approaches for parametric modeling of the
longitudinal outcome and the missing data process. The
first is the selection models [6]. The selection models are
better choice if the interest is on the inference about the
marginal distribution of the response. This why we choose
such models in this article. The second is the pattern
mixture models [19]. The third is the shared parameter
models [8]. For more details, refer to Molenberghs and
Fitzmaurice [22].
The stochastic EM algorithm (SEM), suggested by
Celeux and Diebolt [2], has been developed to facilitate
the E-step of the EM algorithm. The stochastic EM
algorithm has been extended to the longitudinal studies by
Gad and Ahmed [9]. Other alternatives include the
stochastic approximation EM (SAEM) algorithm [5] and
the Monte Carlo EM (MCEM) algorithm [25]. Booth and
Hobert [1] used an automated Monte Carlo EM algorithm
to compute the E-step of the EM algorithm. A
disadvantage of the MCEM algorithm is that the generated
values are updated at each iteration which requires heavy
computations and as a result this affects the speed of the
convergence. In addition, the convergence is not
guaranteed for a fixed Monte Carlo sample size [26].
Thus, the MCEM is developed using the parametric
fractional imputation to facilitate the expectation step.
Also, this can speed the convergence and to guarantee the
existence of convergence [14,15,16,27].
Kim and Kim [16] applied the parametric fractional
imputation in the context of cross-sectional studies to deal
with the missingness problem in the case of nonignroable
missing mechanism. Yang et al. [27] generalized the