Score Comparability of Short Forms and Computerized Adaptive Testing: Simulation Study With the Activity Measure for Post-Acute Care Stephen M. Haley, PhD, PT, Wendy J. Coster, PhD, OTR, Patricia L. Andres, MS, PT, Mark Kosinski, MA, Pengsheng Ni, MD, MPH ABSTRACT. Haley SM, Coster WJ, Andres PL, Kosinski M, Ni P. Score comparability of short forms and computerized adaptive testing: simulation study with the Activity Measure for Post-Acute Care. Arch Phys Med Rehabil 2004;85:661-6. Objective: To compare simulated short-form and comput- erized adaptive testing (CAT) scores to scores obtained from complete item sets for each of the 3 domains of the Activity Measure for Post-Acute Care (AM-PAC). Design: Prospective study. Setting: Six postacute health care networks in the greater Boston metropolitan area, including inpatient acute rehabilita- tion, transitional care units, home care, and outpatient services. Participants: A convenience sample of 485 adult volunteers who were receiving skilled rehabilitation services. Interventions: Not applicable. Main Outcome Measures: Inpatient and community-based short forms and CAT applications were developed for each of 3 activity domains (physical & mobility, personal care & instrumental, applied cognition) using item pools constructed from new items and items from existing postacute care instru- ments. Results: Simulated CAT scores correlated highly with score estimates from the total item pool in each domain (4- and 6-item CAT r range, .90 –.95; 10-item CAT r range, .96 –.98). Scores on the 10-item short forms constructed for inpatient and community settings also provided good estimates of the AM- PAC item pool scores for the physical & movement and per- sonal care & instrumental domains, but were less consistent in the applied cognition domain. Conﬁdence intervals around individual scores were greater in the short forms than for the CATs. Conclusions: Accurate scoring estimates for AM-PAC do- mains can be obtained with either the setting-speciﬁc short forms or the CATs. The strong relationship between CAT and item pool scores can be attributed to the CAT’s ability to select speciﬁc items to match individual responses. The CAT may have additional advantages over short forms in practicality, efﬁciency, and the potential for providing more precise scoring estimates for individuals. Key Words: Activities of daily living; Outcomes research; Rehabilitation. © 2004 by the American Congress of Rehabilitation Medi- cine and the American Academy of Physical Medicine and Rehabilitation A S PATIENTS RECOVER from illness or injury, an as- sessment system to measure a continuously changing rep- ertoire of functional skills is needed throughout the continuum of postacute care services. Despite mounting interest, no sys- tem has emerged that can effectively measure functional out- comes across settings. 1,2 We have highlighted 3 problems that currently plague outcome measurement in postacute care set- tings: limited breadth, poor precision, and lack of feasibility. 3 Measurement precision is optimal when the content of func- tional items and the patients’ abilities are closely matched. However, in heterogeneous groups, such as are seen in post- acute care services, an optimal set of items that ﬁts most patients in a particular subgroup may not be relevant for all patients in the larger group. Therefore, any one instrument developed for a speciﬁc setting typically has considerable ﬂoor and ceiling effects when used in other postacute care settings. 4 To make instruments more practical, the range of content is often compromised, leading to large amounts of measurement noise at various levels of the scale. However, at the level of individual patient assessment, precision is required if either treatment or placement decisions are based on functional scores. To achieve comprehensiveness and precision within a ﬁxed-item format, some monitoring systems (eg, the recently proposed Minimum Data Set–Post Acute Care) 5 are cumber- some and impractical. Collectively, the lack of breadth, un- equal precision for all patients, and the limited feasibility of current systems severely restrict the ﬁeld’s ability to measure and analyze rehabilitation progress across the continuum of postacute care settings. 6-9 Recently, there has been intense interest in the application of item response theory (IRT) to develop the next generation of practical and precise instruments for monitoring functional recovery, 10-12 by overcoming the unremitting breadth, preci- sion, and practicality challenges. To realize many of the po- tential measurement advantages of IRT, item pools 13 are de- veloped that contain high-quality items to tap many levels of functional abilities. Item pools are often built by equating functional items from different sources so that they can be linked to form a comprehensive sample of abilities on a com- mon, underlying metric. The development of conceptually valid item pools along meaningful functional dimensions ap- pears to hold promise for the creation of ﬁxed-length short forms and computerized adaptive testing (CAT) systems, 13,14 perhaps revolutionizing the manner in which assessments are administered and scored in clinical practice. However, many From the Research and Training Center on Measuring Rehabilitation Outcomes, Center for Rehabilitation Effectiveness, Sargent College of Health and Rehabilitation Sciences, Boston University, Boston, MA (Haley, Coster, Andres, Ni); and Quality- Metric Inc, Lincoln, RI (Kosinski). Supported in part by the National Institute on Disability and Rehabilitation Re- search (grant no. H133B990005), the National Institute of Child Health and Human Development (grant no. R01 HD43568), and the Agency for Healthcare Research and Quality. The contents of this article are solely the responsibility of the authors and do not necessarily represent the ofﬁcial views of the funders. No commercial party having a direct ﬁnancial interest in the results of the research supporting this article has or will confer a beneﬁt upon the author(s) or upon any organization with which the author(s) is/are associated. Reprint requests to Stephen M. Haley, PhD, PT, Research and Training Center on Measuring Rehabilitation Outcomes, Center for Rehabilitation Effectiveness, Sargent College of Health and Rehabilitation Sciences, Boston University, 635 Common- wealth Ave, Boston, MA 02215, e-mail: smhaley@bu.edu. 0003-9993/04/8504-8077$30.00/0 doi:10.1016/j.apmr.2003.08.097 661 Arch Phys Med Rehabil Vol 85, April 2004