Technical Note Within-subject variation in BOLD-fMRI signal changes across repeated measurements: Quantification and implications for sample size Bram B. Zandbelt, a, ⁎ Thomas E. Gladwin, a Mathijs Raemaekers, c Mariët van Buuren, a Sebastiaan F. Neggers, a René S. Kahn, a Nick F. Ramsey, b and Matthijs Vink a a Rudolf Magnus Institute of Neuroscience, Department of Psychiatry, University Medical Center Utrecht, Utrecht, The Netherlands b Rudolf Magnus Institute of Neuroscience, Department of Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands c Helmholtz Institute, Department of Functional Neurobiology, University of Utrecht, Utrecht, The Netherlands Received 5 September 2007; revised 8 February 2008; accepted 14 April 2008 Available online 24 April 2008 Functional magnetic resonance imaging (fMRI) can be used to detect experimental effects on brain activity across measurements. The success of such studies depends on the size of the experimental effect, the reliability of the measurements, and the number of subjects. Here, we report on the stability of fMRI measurements and provide sample size estimations needed for repeated measurement studies. Stability was quantified in terms of the within-subject standard deviation (σ w ) of BOLD signal changes across measurements. In contrast to correlation measures of stability, this statistic does not depend on the between-subjects variance in the sampled group. Sample sizes required for repeated measurements of the same subjects were calculated using this σ w . Ten healthy subjects performed a motor task on three occasions, separated by one week, while being scanned. In order to exclude training effects on fMRI stability, all subjects were trained extensively on the task. Task performance, spatial activation pattern, and group-wise BOLD signal changes were highly stable over sessions. In contrast, we found substantial fluctuations (up to half the size of the group mean activation level) in individual activation levels, both in ROIs and in voxels. Given this large degree of instability over sessions, and the fact that the amount of within-subject variation plays a crucial role in determining the success of an fMRI study with repeated measurements, improving stability is essential. In order to guide future studies, sample sizes are provided for a range of experimental effects and levels of stability. Obtaining estimates of these latter two variables is essential for selecting an appropriate number of subjects. © 2008 Elsevier Inc. All rights reserved. Keywords: fMRI; Reliability; Within-subject variation; Sample size; Motor Introduction The effect of an intervention, for example pharmacological treatment or repetitive transcranial magnetic stimulation (rTMS), can be investigated with repeated measurements on the same subjects. By administering experimental and control treatment in random order to the same group of subjects, the mean difference between treatment conditions can be calculated and tested for statistical significance. Recently, this type of study (i.e. crossover design) has been applied to functional MRI (fMRI). For instance, fMRI signal changes were observed in the motor cortex of patients recovering from stroke after treatment with fluoxetine (Pariente et al., 2001), in the amygdala following oxytocin administration (Kirsch et al., 2005), and in the prefrontal cortex in response to a catecholamine-O-methyltransferase inhibitor (Apud et al., 2007). The success of such a design depends on statistical power, which in turn depends on (a) the difference between experimental and control treatment, (b) measurement error, and (c) sample size. For single- session fMRI studies, the effect of measurement error on statistical power and sample size has been determined (Desmond and Glover, 2002). These findings may not be valid for fMRI studies with multiple sessions, however, as factors that are stable within a session (e.g. subject position in the scanner) can differ between sessions (Genovese et al., 1997). To obtain an estimate of this between- session measurement error, a test–retest reliability analysis measur- ing the same variable on the same sample of subjects should be performed, in absence of any between-measurement experimental manipulation. A number of studies has investigated the test–retest reliability of fMRI, and reported almost perfect (Aron et al., 2006; Fernandez et al., 2003; Specht et al., 2003) to at best moderate reliability (Raemaekers et al., 2007; Wei et al., 2004). The majority of studies expressed test–retest reliability of fMRI signal changes in terms of www.elsevier.com/locate/ynimg NeuroImage 42 (2008) 196 – 206 ⁎ Corresponding author. Rudolf Magnus Institute of Neuroscience, University Medical Center Utrecht, Room A.01.126, P.O. Box 85500, NL-3508 GA Utrecht, The Netherlands. Fax: +31 88 7555443. E-mail address: b.b.zandbelt@umcutrecht.nl (B.B. Zandbelt). Available online on ScienceDirect (www.sciencedirect.com). 1053-8119/$ - see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2008.04.183