Computational Statistics and Data Analysis ( ) –
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis
journal homepage: www.elsevier.com/locate/csda
Latent profile analysis with nonnormal mixtures: A Monte
Carlo examination of model selection using fit indices
Grant B. Morgan
a,∗
, Kari J. Hodge
a,1
, Aaron R. Baggett
b,1
a
Department of Educational Psychology, Baylor University, One Bear Place #97301, Waco, TX, 76798-7301, USA
b
Department of Psychology, University of Mary Hardin-Baylor, Box 8014, Belton, TX, 76513-8014, USA
article info
Article history:
Received 30 April 2014
Received in revised form 27 February 2015
Accepted 28 February 2015
Available online xxxx
Keywords:
Mixture model
Model selection
Nonnormal data
abstract
The performances of fit indices used for model selection in cross-sectional mixture
modeling with nonnormally distributed indicators were examined in two studies using
Monte Carlo methods. Simulation conditions were selected to mirror conditions found
in educational and psychological research. The design factors under investigation were:
indicator distribution, number of indicators, sample size, and profile prevalence. All
models contained five, ten, or 15 continuous indicators with varying departures from
normality. The fit indices examined were Akaike’s information criterion (AIC), corrected
Akaike’s information criterion (AICc), consistent Akaike’s information criterion (CAIC),
Bayesian information criterion (BIC), sample size-adjusted Bayesian information criterion
(SSBIC), Draper’s information criterion (DIC), integrated classification likelihood criterion
with Bayesian-type approximation (ICL), entropy, and the adjusted Lo–Mendell–Rubin
likelihood ratio test (LMR). In the first study, nonnormally distributed data were used to
estimate the mixture models. No fit index uniformly identified the simulated number of
profiles using nonnormal indicators. The fit indices that tended to identify the simulated
number of profiles more frequently than others were BIC, SSBIC, CAIC, and LMR although
the condition(s) in which this was observed varied. In the second study, the raw data were
transformed using van der Waerden quantile normal scores. Despite deflating the indicator
variances, the use of normal scores increased the frequency with which fit indices identified
the simulated number of profiles across most conditions.
© 2015 Elsevier B.V. All rights reserved.
1. Introduction
Classification procedures have been used for decades by researchers interested in classifying individual cases of a hetero-
geneous dataset into homogeneous groups. During this time, classification methods have been applied in many disciplines,
such as business, education, medicine, and the social sciences. Generally, classification refers to the process of dividing
a large, heterogeneous set of observations into smaller, homogeneous groups with smaller within-group variability and
greater between-group variability (Clogg, 1995; Gordon, 1981; Heinen, 1996; Muthén and Muthén, 2000). The primary
challenge facing researchers is that the frequency and form of the groups underlying a complex dataset is rarely known in
advance. The frequency of the groups refers to the number and size of each group, and the form refers to the group-specific
∗
Corresponding author. Tel.: +1 254 710 7231; fax: +1 254 710 3265.
E-mail addresses: grant_morgan@baylor.edu (G.B. Morgan), kari_hodge@baylor.edu (K.J. Hodge), abaggett@umhb.edu (A.R. Baggett).
1
There is a supplementary material comprising the tables that contain the frequency with which each fit index identified the competing component
models and a sample of the Mplus and SAS code.
http://dx.doi.org/10.1016/j.csda.2015.02.019
0167-9473/© 2015 Elsevier B.V. All rights reserved.