Bootstrap methods for measuring classification uncertainty in latent class analysis Jos´ e G. Dias 1 and Jeroen K. Vermunt 2 1 ISCTE – Higher Institute of Social Sciences and Business Studies, Edif´ ıcio ISCTE, Av. das For¸cas Armadas, 1649–026 Lisboa, Portugal jose.dias@iscte.pt 2 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands J.K.Vermunt@uvt.nl This paper addresses the issue of classification uncertainty in latent class analysis. It proposes a new bootstrap-based approach for quantifying the level of classification uncertainty at both the individual and the aggregate level. The procedure is illustrated by means of two applications. 1 Introduction Model-based clustering by latent class (LC) models can be formulated as follows. Let y denote a J -dimensional observation and D = {y 1 , ..., y n } a sample of size n. Each data point is assumed to be a realization of the random variable Y coming from an S-component mixture probability density function (p.d.f.) f (y i ; ϕ)= S s=1 π s f s (y i ; θ s ), (1) where π s are positive mixing proportions that sum to one, θ s are the pa- rameters defining the conditional distribution f s (y i ; θ s ) for component s, and ϕ = {π 1 , ..., π S1 , θ 1 , ..., θ S }. Note that π S =1 S1 s=1 π s . The log- likelihood function for a LC model – assuming i.i.d. observations – has the form (ϕ; y)= n i=1 log f (y i ; ϕ), which is straightforward to maximize (yield- ing the MLE - maximum likelihood estimator) by the EM algorithm [DLR77]. Our results concern standard LC models; that is, mixtures of indepen- dent multinomial distributions [Clo95, VM03]. For nominal data, let Y j have L j categories, i.e., y ij ∈{1, ..., L j }. The standard LC model with S la- tent classes is obtained by defining the conditional density as f s (y i ; θ s )= J j=1 L j l=1 θ I (y ij =l) sjl , where θ sjl denotes the probability that an observation belonging to latent class s gives response l on variable j , and where I (y ij = l)