PSYCHOMETRIKAVOL. 78, NO. 1, 116–133 JANUARY 2013 DOI : 10.1007/ S11336-012-9293-1 HOW SHOULD WE ASSESS THE FIT OF RASCH-TYPE MODELS? APPROXIMATING THE POWER OF GOODNESS-OF-FIT STATISTICS IN CATEGORICAL DATA ANALYSIS ALBERTO MAYDEU-OLIVARES FACULTY OF PSYCHOLOGY, UNIVERSITY OF BARCELONA ROSA MONTAÑO UNIVERSIDAD DE SANTIAGO DE CHILE We investigate the performance of three statistics, R 1 , R 2 (Glas in Psychometrika 53:525–546, 1988), and M 2 (Maydeu-Olivares & Joe in J. Am. Stat. Assoc. 100:1009–1020, 2005, Psychome- trika 71:713–732, 2006) to assess the overall fit of a one-parameter logistic model (1PL) estimated by (marginal) maximum likelihood (ML). R 1 and R 2 were specifically designed to target specific assump- tions of Rasch models, whereas M 2 is a general purpose test statistic. We report asymptotic power rates under some interesting violations of model assumptions (different item discrimination, presence of guess- ing, and multidimensionality) as well as empirical rejection rates for correctly specified models and some misspecified models. All three statistics were found to be more powerful than Pearson’s X 2 against two- and three-parameter logistic alternatives (2PL and 3PL), and against multidimensional 1PL models. The results suggest that there is no clear advantage in using goodness-of-fit statistics specifically designed for Rasch-type models to test these models when marginal ML estimation is used. Key words: discrete data, power, IRT, maximum likelihood. 1. Introduction Broadly speaking, item response theory (IRT) refers to the class of latent trait models for discrete multivariate data obtained by coding the responses to a set of questionnaire items, such as those found in educational tests, personality inventories, etc. Rasch-type models are a subset of IRT models, so named after the pioneering work of Rasch (1960). Rasch-type models are characterized by two properties (McDonald, 1999): (a) the sum score is a sufficient statistic for the latent traits, and (b) comparisons of subpopulations are made independently of the item or items used for the comparison (the so-called specific objectivity property). Although only highly restrictive IRT models can satisfy these properties, their mathematical potential has led some researchers to prefer them to all other IRT models. Thus, we may distinguish between two traditions in IRT modeling: a model-based tradition and a data-based tradition. In the model- based tradition, a model with appealing mathematical properties is selected first (a Rasch-type model) and tests are designed to fit the model. By contrast, in a data-based tradition, different models within the IRT family are explored to find the best fitting model for the available data. Because of the availability of sufficient statistics for the latent traits that do not depend on item parameters, estimation methods (conditional maximum likelihood, or CML) and goodness- of-fit testing procedures have been developed specifically for Rasch-type models (for an overview This research was supported by an ICREA-Academia Award and Grant SGR 2009 74 from the Catalan Government, and by Grants PSI2009-07726 and PR2010-0252 from the Spanish Ministry of Education awarded to the first author, and by a Dissertation Research Award of the Society of Multivariate Experimental Psychology awarded to the second author. The authors are indebted to the reviewers and to David Thissen for comments that improved the manuscript. Requests for reprints should be sent to Alberto Maydeu-Olivares, Faculty of Psychology, University of Barcelona, P. Vallede Hebrón, 171, 08035 Barcelona, Spain. E-mail: amaydeu@ub.edu © 2012 The Psychometric Society 116