Development of a Flexilevel Scale for use with computer-
adaptive testing for assessing shoulder function
Karon F. Cook, PhD,
a,b,c
Toni S. Roddey, PT, PhD, OCS, FAAOMPT,
d
Kimberly J. O’Malley, PhD,
a,e,f
and
Gary M. Gartsman, MD,
g
Houston and Austin, TX
In a 5-year study, a self-report measure of shoulder
function—the Flexilevel Scale of Shoulder Function
(FLEX-SF)—was developed by use of item response the-
ory. A large pool of candidate items (N = 68) was
developed. A questionnaire that included the 68 items,
another scale of shoulder function, and clinical and
demographic questions were administered to 400 per-
sons with shoulder complaints. Patients’ responses to
the 68 items were calibrated by use of Andrich’s rat-
ing scale model. Thirty-three items were selected from
the pool and subdivided into three overlapping testlets
targeting low, medium, and high shoulder function. A
table translates raw scores on testlets to a common
mathematical metric. The validity and reliability of the
FLEX-SF was evaluated in a longitudinal study of 199
patients. The FLEX-SF scores were highly reliable and
exhibited excellent validity (including responsiveness).
We report on a simulation of a computer-adaptive test
of shoulder function. This simulation is based on the
developmental items we tested for use in the FLEX-SF.
The results indicate that greater measurement effi-
ciency can be achieved with a computer-adaptive test
format. (J Shoulder Elbow Surg 2005;14:90S-94S.)
I n recent years, interest in self-reported outcomes has
increased substantially. The use of patients’ subjective
judgments to evaluate health outcomes implies confi-
dence that the measures that elicit these self-reports
are scientifically sound. Psychometrics is the science
and mathematics that concerns itself with such issues.
The development of an outcome measure by use of
psychometric methods is a rigorous, expensive, and
time-consuming project. The investment returns confi-
dence that the scale’s scores adequately and accu-
rately portray the outcome of interest. Scientifically
sound measurement is fundamental to excellence in
research and clinical evaluations.
Our work in evaluating the psychometric proper-
ties of existing scales of self-reported shoulder out-
come
4,6,20
convinced us of the need to develop a
new measure. This report details how, in a 5-year
study, psychometric methods were used to develop a
self-report measure of shoulder function—the Flexi-
level Scale of Shoulder Function (FLEX-SF).
5
We de-
veloped this scale by use of item response theory
(IRT),
8
a psychometric method that (1) accounts for
differences in item difficulty and (2) supports Flexilevel
Scales. A Flexilevel Scale is composed of 2 or more
“testlets,” or subsets of items, that target respondents
with different levels of the trait being measured. The
FLEX-SF measures self-reported shoulder function. Pa-
tients respond to an initial “routing item” that classifies
them as having low, medium, or high shoulder func-
tion. They then respond only to the testlet that best
targets their level of shoulder function.
In addition to describing the development of the
FLEX-SF, we report on a simulation of a computer-
adaptive test (CAT) of shoulder function. This simula-
tion is based on the developmental items we tested for
use in the FLEX-SF. CAT-based outcome measures are
more efficient even than Flexilevel Scales. They hold
substantial promise in the field of outcomes research.
ITEM POOL DEVELOPMENT
A first step in the psychometric method is to de-
velop a large pool of items that could, potentially, be
included in the final measure. With a large pool of
initial items, scale developers can be selective in
choosing the best items for the measure. Scale devel-
opers can gather potential items in 3 major ways: (1)
adapt published items from other physical function
scales, (2) write items based on input from an expert
panel, and (3) develop items based on patient inter-
views. We used each of these in developing the item
pool for the FLEX-SF. Existing physical function scales
From the
a
Houston Veterans Affairs Parkinson’s Disease Research
and Educational Center,
b
Measurement Excellence and Training
Resource Information Center (METRIC): A Veterans Affairs Health
Services Research and Development Resource Center,
c
Baylor
College of Medicine,
d
Texas Woman’s University,
e
Health Ser-
vices Research and Development Center for Quality of Care and
Utilization Studies, and
g
University of Texas School of Medicine,
Houston, and
f
Pearson Educational Measurement, Austin.
Reprint requests: Karon F. Cook, PhD, PADRECC, Houston VAMC
(127-PD), 2002 Holcombe Blvd, Houston, TX 77030 (E-mail:
karonc@bcm.tmc.edu).
Copyright © 2005 by Journal of Shoulder and Elbow Surgery
Board of Trustees.
1058-2746/2005/$30.00
doi:10.1016/j.jse.2004.09.024
90S