Model Based Clustering for Longitudinal Data Rolando De la Cruz-Mes´ ıa * Fernando A. Quintana, Guillermo Marshall † April 2, 2007 Abstract A model-based clustering method is proposed for clustering individuals on the basis of mea- surements taken over time. Data variability is taken into account through non-linear hierarchical models leading to a mixture of hierarchical models. We study both frequentist and Bayesian estimation procedures. From a classical viewpoint, we discuss maximum likelihood estimation of this family of models through the EM algorithm. From a Bayesian standpoint, we develop appropriate Markov chain Monte Carlo (MCMC) sampling schemes for the exploration of target posterior distribution of parameters. The methods are illustrated with the identification of hor- mone trajectories that are likely to lead to adverse pregnancy outcomes in a group of pregnant women. Keywords: EM-algorithm, Cluster analysis, Markov chain Monte Carlo, Mixture model, Non-linear models, Random effects. 1 Introduction The use of mixture models for clustering is sometimes referred to as model-based probabilistic clustering (Fraley and Raftery, 1998, 2002), since a particular functional form for the component densities must be assumed. Finite mixture models are widely used for clustering data in a variety of applications (see McLachlan and Basford, 1988). Many standard clustering algorithms are based on the assumption that the vectors to be clustered are realizations of random vectors from some parametric statistical model. These models usually place no restriction on the mean structure via covariates or otherwise. However, in many applications there is potential for parsimonious representation of the mean. For example, medical studies often yield time series-type data where each d-dimensional vector consists of measurements at d different time points. In such cases, it seems natural to model the mean via regression and we will show that * Departamento de Salud P´ ublica, Facultad de Medicina, Pontificia Universidad Cat´ olica de Chile, Marcoleta 434, Santiago, Casilla 114D, CHILE. rolando@med.puc.cl. Partially funded by grant FONDECYT 3060071 † Departamento de Estad´ ıstica, Facultad de Matem´ aticas, Pontificia Universidad Cat´ olica de Chile, Casilla 306, Correo 22, Santiago, CHILE. {quintana,gm}@mat.puc.cl. Partially funded by grants FONDECYT 1060729 and 1060721 1