17 Using Item Response Theory to Evaluate Measurement Invariance in Health-Related Measures Roger E. Millsap, Heather Gunn, Howard T. Everson, and Alex Zautra Introduction For more than a decade, the National Institutes of Health (NIH) has supported the devel- opment of the Patient Reported Outcomes Information System (PROMIS ® ) Roadmap initiative (www.nihpromis.org). This effort reflects the growing importance of and interest in the use of measures of patients’ self-reported health status in medical and health care research, as well as in clinical practice. The goal of the NIH initiative is to support the development of a large bank of items for measuring patient-reported outcomes (PROs), which, ultimately, could be assembled into measurement scales for use in both clinical and research settings. With NIH support the PROMIS ® development initiative has been able to make both the items and the scales available to clinicians and researchers (Cella et al., 2007). Our main goal in this chapter is to present researchers and others working with PRO- MIS ® and other scales with a model-based psychometric perspective about how best to identify items that may function differently for different patient groups, and thereby con- tribute to the challenge of maintaining validity in these health-related measurement scales. In the psychometric literature these model-based methods are described more formally under the general headings of differential item functioning (DIF) or measurement invari- ance (Millsap, 2011; Osterlind & Everson, 2009; Widaman & Reise, 1997). With context in mind, we begin with a review of the definition of measurement invariance and how violations of invariance are distinguished from simple group differences in scores. Also, we demonstrate how contemporary methods based on item response theory are applied to the challenge of empirically investigating measurement invariance. We illustrate these methods using a running example based on responses from a community sample of a widely used health survey measure, the SF-36 scale. Measurement Invariance The idea of measurement invariance originated through attempts to model formally how item responses and test scores vary as a function of the respondent’s status on the latent (unobserved) variable to be measured. A test or questionnaire item may have been designed to measure a particular psychological attribute, for example, a depression scale item ask- ing about the presence of a symptom of depression. From a psychometric measurement model perspective, we can express the probabilities of possible responses to this item as a function of the respondent’s location on a hypothesized latent scale of depression. We say the depression item is measurement invariant to the extent that, after accounting for the respondent’s status on the latent depression scale, no other systematic influence on the response probabilities exists. Another way of expressing this idea is to say that if two 6241-623-1pass-P3-017-r02.indd 364 9/25/2014 12:53:32 AM