American Journal of Applied Mathematics and Statistics, 2016, Vol. 4, No. 3, 59-66 Available online at http://pubs.sciepub.com/ajams/4/3/1 © Science and Education Publishing DOI:10.12691/ajams-4-3-1 Maximum Likelihood Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using Fractional Imputation Abdallah S. A. Yaseen 1 , Ahmed M. Gad 2,* , Abeer S. Ahmed 1 1 The National Centre for Social and Criminological Research, Cairo, Egypt 2 Statistics Department, Faculty of Economics and Political Science, Cairo University, Egypt *Corresponding author: dr_ahmedgad@yahoo.co.uk Abstract In longitudinal studies data are collected for the same set of units for two or more occasions. This is in contrast to cross-sectional studies where a single outcome is measured for each individual. Some intended measurements might not be available for some units resulting in a missing data setting. When the probability of missing depends on the missing values, missing mechanism is termed nonrandom. One common type of the missing patterns is the dropout where the missing values never followed by an observed value. In nonrandom dropout, missing data mechanism must be included in the analysis to get unbiased estimates. The parametric fractional imputation method is proposed to handle the missingness problem in longitudinal studies and to get unbiased estimates in the presence of nonrandom dropout mechanism. Also, in this setting the jackknife replication method is used to find the standard errors for the fractionally imputed estimates. Finally, the proposed method is applied to a real data (mastitis data) in addition to a simulation study. Keywords: longitudinal data, mastitis data, missing data, nonrandom dropout, parametric fractional imputation, repeated measures, standard errors Cite This Article: Abdallah S. A. Yaseen, Ahmed M. Gad, and Abeer S. Ahmed, “Maximum Likelihood Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using Fractional Imputation.” American Journal of Applied Mathematics and Statistics, vol. 4, no. 3 (2016): 59-66. doi: 10.12691/ajams-4-3-1. 1. Introduction The defining characteristic of longitudinal studies is that sample units are measured repeatedly over time. That is, data are collected for the same set of units for two or more occasions. Missing values are not uncommon with longitudinal data. Missing data mechanisms can be classified according to the process causing missingness, as defined by Little and Rubin [17]. These include; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) mechanism. Missing not at random mechanism is always termed nonignorable missing data mechanism. In this case the missing data mechanism must be included in the analysis, so as to get unbiased estimates. Another important classification is the missingness pattern: the dropout and intermittent pattern. In dropout pattern a subject who leaves the study at some time point does not appear again; a missing value never followed by an observed value, whereas in intermittent pattern a missing value may be followed by an observed value. Handling missing data requires jointly modeling the longitudinal outcome and the missing data process. There are many approaches for parametric modeling of the longitudinal outcome and the missing data process. The first is the selection models [6]. The selection models are better choice if the interest is on the inference about the marginal distribution of the response. This why we choose such models in this article. The second is the pattern mixture models [19]. The third is the shared parameter models [8]. For more details, refer to Molenberghs and Fitzmaurice [22]. The stochastic EM algorithm (SEM), suggested by Celeux and Diebolt [2], has been developed to facilitate the E-step of the EM algorithm. The stochastic EM algorithm has been extended to the longitudinal studies by Gad and Ahmed [9]. Other alternatives include the stochastic approximation EM (SAEM) algorithm [5] and the Monte Carlo EM (MCEM) algorithm [25]. Booth and Hobert [1] used an automated Monte Carlo EM algorithm to compute the E-step of the EM algorithm. A disadvantage of the MCEM algorithm is that the generated values are updated at each iteration which requires heavy computations and as a result this affects the speed of the convergence. In addition, the convergence is not guaranteed for a fixed Monte Carlo sample size [26]. Thus, the MCEM is developed using the parametric fractional imputation to facilitate the expectation step. Also, this can speed the convergence and to guarantee the existence of convergence [14,15,16,27]. Kim and Kim [16] applied the parametric fractional imputation in the context of cross-sectional studies to deal with the missingness problem in the case of nonignroable missing mechanism. Yang et al. [27] generalized the