Fast Free-Form Deformation using the Normalised Mutual Information gradient and Graphics Processing Units Marc Modat 1 , Zeike A. Taylor 1 , Josephine Barnes 2 , David J. Hawkes 1 , Nick C. Fox 2 , and S´ ebastien Ourselin 1 1 Centre for Medical Imaging Computing, Department of Medical Physics and Bioengineering, University College London, UK, 2 Dementia Research Centre, Institute of Neurology, University College London, UK. Abstract. Non-rigid registration is a tool commonly used in medical image analysis. However techniques are usually time consuming. In this paper we present a fast registration framework which is a modiﬁcation of the well-known Free-Form Deformation (FFD) algorithm. Our algorithm uses the analytical Normalized Mutual Information gradient which leads to a highly parallel framework. Its implementation is therefore suitable for execution via Grapics Processing Units. We apply the new method to estimate the brain atrophy on Alzheimer’s disease subjects and show that accuracy is similar to the classical FFD, however the computation time is dramatically decreased. 1 Introduction In longitudinal studies of atrophy the Boundary Shift Integral [1] (BSI) technique is widely used in imaging studies [2]. However this method is labour-intensive as it requires segmentation of both the brain baseline and repeat scans. Im- portantly segmentations are only semi-automatic and thus require signiﬁcant operator time. The clinical trial sizes are increasing and consequently the time spent segmenting the brain scans. Using the Jacobian Integration [3] (JI) the segmentation time can be reduced by half as only one brain segmentation is nec- essary. The JI requires a non-rigid registration pre-step to compute the Jacobian determinant map. The most common non-rigid frameworks used in clinical trials are the ﬂuid [4] and the Free-Form Deformation [5] (FFD) algorithms. FFD has been shown to perform well in Alzheimer’s disease patient atrophy estimation [6]. Although FFD appears to be more accurate than the BSI [3], it is very time consuming. Some groups have implemented supercomputer- [7] or FPGA-based [8] so- lutions to accommodate time constraints. However these kinds of hardware are either high-cost or require specialised skills. Graphics Processing Unit- (GPU-) based computation is a more accessible high performance alternative which has been shown to be eﬀective for computationally expensive applications [9]. How- ever since GPUs are highly parallel devices this eﬀectiveness depends on the level