J Chron Dis Vol. 39. No. I I. pp. X97-906, 1986 Printed in Great Britain. All rights reserved 0021-9681186 $3.00 + 0.00 CopyrIght I( Pergamon Journals Ltd ASSESSING THE RESPONSIVENESS OF FUNCTIONAL SCALES TO CLINICAL CHANGE: AN ANALOGY TO DIAGNOSTIC TEST PERFORMANCE RICHARD A. DEYO and ROBERT M. CENTOR Division of General Internal Medicine, Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78284 and the Division of General Medicine and Primary Care, Medical College of Virginia zyxwvutsrqponmlkjih (Received in revised .form 13 February 1986) Abstractane characteristic of newer health or functional status scales which has received little attention is their responsiveness over time to clinical change. In part, this is because methods for assessing this characteristic are crude and not well standardized. We suggest that scales be viewed as “diagnostic tests” for discriminating between improved and unimproved patients. With this perspective, one may construct receiver operating characteristic (ROC) curves describing a scale’s ability to detect improvement (or failure to improve) using some external criterion. This method is illustrated using data from a study of acute low back pain, comparing the Sickness Impact Profile. its major subscales, and a shorter, more disease-specific scale. The results demonstrate an advantage of the ROC approach over simple pre- and post-treatment score comparisons in assessing scale responsiveness. They also suggest some advantage for a brief disease-specific scale over the lengthier “generic” SIP. INTRODUCTION THE DEVELOPMENT and testing of scales for measuring function or “health status” requires attention to reliability, validity, and responsiveness of scales to clinical changes which occur over time. When functional scales are used as outcome measures in clinical trials, responsiveness is crucial because this determines, in part, the statistical power of a trial: its ability to detect a difference between treatments when one is present [l]. While methods, terminology, and statistics for assessing reliability and validity are reasonably standardized [2,3], this is not true for assessing responsiveness to change. As a result, this characteristic has received insufficient attention in the development of functional scales. Furthermore, when competing scales exist for a given purpose, there are virtually no comparative data to indicate which may be more responsive [4]. We argue that assessing the responsiveness of functional scales is analogous to assessing the discriminating properties of a diagnostic test. In this case, the condition to be “diagnosed” is whether or not a clinically important change has occurred. Functional scale scores show random variability over time, and, like other diagnostic tests, never provide perfect measurements. Thus, there will be “true positive” and “false positive” changes in functional scores over time. One may therefore describe a scale’s responsiveness in terms of sensitivity and specificity in detecting improvement or deterioration, as established by other criteria. The issue is not merely sensitivity to change, but ability to discriminate between those who improve and those who do not. Preparation of this paper was assisted by a grant from the Robert Wood Johnson Foundation, Princeton, New Jersey. The opinions, conclusions, and proposals in the text are those of the authors and do not necessarily represent the views of the Robert Wood Johnson Foundation. Reprint requests should be addressed to: Dr. R. A. Deyo, Health Systems Research and Development, Seattle V.A. Medical Center, 1660 South Columbian Way. Seattle, WA 98108, U.S.A. 897