Technical Note
Within-subject variation in BOLD-fMRI signal changes across
repeated measurements: Quantification and implications
for sample size
Bram B. Zandbelt,
a,
⁎
Thomas E. Gladwin,
a
Mathijs Raemaekers,
c
Mariët van Buuren,
a
Sebastiaan F. Neggers,
a
René S. Kahn,
a
Nick F. Ramsey,
b
and Matthijs Vink
a
a
Rudolf Magnus Institute of Neuroscience, Department of Psychiatry, University Medical Center Utrecht, Utrecht, The Netherlands
b
Rudolf Magnus Institute of Neuroscience, Department of Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
c
Helmholtz Institute, Department of Functional Neurobiology, University of Utrecht, Utrecht, The Netherlands
Received 5 September 2007; revised 8 February 2008; accepted 14 April 2008
Available online 24 April 2008
Functional magnetic resonance imaging (fMRI) can be used to detect
experimental effects on brain activity across measurements. The success of
such studies depends on the size of the experimental effect, the reliability of
the measurements, and the number of subjects. Here, we report on the
stability of fMRI measurements and provide sample size estimations
needed for repeated measurement studies. Stability was quantified in
terms of the within-subject standard deviation (σ
w
) of BOLD signal
changes across measurements. In contrast to correlation measures of
stability, this statistic does not depend on the between-subjects variance in
the sampled group. Sample sizes required for repeated measurements of
the same subjects were calculated using this σ
w
. Ten healthy subjects
performed a motor task on three occasions, separated by one week, while
being scanned. In order to exclude training effects on fMRI stability, all
subjects were trained extensively on the task. Task performance, spatial
activation pattern, and group-wise BOLD signal changes were highly
stable over sessions. In contrast, we found substantial fluctuations (up to
half the size of the group mean activation level) in individual activation
levels, both in ROIs and in voxels. Given this large degree of instability
over sessions, and the fact that the amount of within-subject variation
plays a crucial role in determining the success of an fMRI study with
repeated measurements, improving stability is essential. In order to guide
future studies, sample sizes are provided for a range of experimental
effects and levels of stability. Obtaining estimates of these latter two
variables is essential for selecting an appropriate number of subjects.
© 2008 Elsevier Inc. All rights reserved.
Keywords: fMRI; Reliability; Within-subject variation; Sample size; Motor
Introduction
The effect of an intervention, for example pharmacological
treatment or repetitive transcranial magnetic stimulation (rTMS),
can be investigated with repeated measurements on the same
subjects. By administering experimental and control treatment in
random order to the same group of subjects, the mean difference
between treatment conditions can be calculated and tested for
statistical significance. Recently, this type of study (i.e. crossover
design) has been applied to functional MRI (fMRI). For instance,
fMRI signal changes were observed in the motor cortex of patients
recovering from stroke after treatment with fluoxetine (Pariente
et al., 2001), in the amygdala following oxytocin administration
(Kirsch et al., 2005), and in the prefrontal cortex in response to a
catecholamine-O-methyltransferase inhibitor (Apud et al., 2007).
The success of such a design depends on statistical power, which in
turn depends on (a) the difference between experimental and control
treatment, (b) measurement error, and (c) sample size. For single-
session fMRI studies, the effect of measurement error on statistical
power and sample size has been determined (Desmond and Glover,
2002). These findings may not be valid for fMRI studies with
multiple sessions, however, as factors that are stable within a session
(e.g. subject position in the scanner) can differ between sessions
(Genovese et al., 1997). To obtain an estimate of this between-
session measurement error, a test–retest reliability analysis measur-
ing the same variable on the same sample of subjects should be
performed, in absence of any between-measurement experimental
manipulation.
A number of studies has investigated the test–retest reliability
of fMRI, and reported almost perfect (Aron et al., 2006; Fernandez
et al., 2003; Specht et al., 2003) to at best moderate reliability
(Raemaekers et al., 2007; Wei et al., 2004). The majority of studies
expressed test–retest reliability of fMRI signal changes in terms of
www.elsevier.com/locate/ynimg
NeuroImage 42 (2008) 196 – 206
⁎
Corresponding author. Rudolf Magnus Institute of Neuroscience,
University Medical Center Utrecht, Room A.01.126, P.O. Box 85500,
NL-3508 GA Utrecht, The Netherlands. Fax: +31 88 7555443.
E-mail address: b.b.zandbelt@umcutrecht.nl (B.B. Zandbelt).
Available online on ScienceDirect (www.sciencedirect.com).
1053-8119/$ - see front matter © 2008 Elsevier Inc. All rights reserved.
doi:10.1016/j.neuroimage.2008.04.183