Introduction The task of managing, storing and maintaining large datasets for stroke recovery research requires a combination of data-management techniques. Many of these approaches are also being intensively devel- oped in the context of neuroinformatics infrastruc- ture for multi-centre clinical trials for other brain disorders (Van Horn and Toga, 2009a), although there has been some resistance among the brain imaging science community to adopting the large- scale neuroinformatics infrastructures now available (Van Horn and Toga, 2009b). This trend is part of a larger movement in the biological sciences called “bio-imaging informatics” (Peng, 2008). We present three types of database-centered infrastructure tech- niques that address different aspects of the stroke research community’s needs, and briefly discuss the idea that some combination of all these approaches is necessary for answering important questions about how the brain recovers from injury or illness. Our overall aim is to integrate these three database and processing frameworks across our multi-nation- al centres to accelerate translational neuroscience with the goal of directly improving clinical care and recovery following stroke. First we present the Predicting Language Outcome and Recovery after Stroke System developed in Britain (Price et al., 2010). The PLORAS system is used to make predictions on the basis of the recov- Data warehousing methods and processing infrastructure for brain recovery research T. GEE 1 , S. KENNY 2 , C.J. PRICE 3 , M.L. SEGHIER 3 , S.L. SMALL 2 , A.P. LEFF 3,4 , A. PACURAR 1 , S.C. STROTHER 1,5 1 Rotman Research Institute and Centre for Stroke Recovery, Baycrest, Toronto, Canada; 2 Computation Institute, University of Chicago, USA; 3 Wellcome Trust Centre for Neuro-Imaging, Institute of Neurology, University College London, UK; 4 Institute of Cognitive Neuroscience, University College London, UK; 5 Medical Biophysics Department and Institute of Medical Sciences, University of Toronto, Canada ABSTRACT In order to accelerate translational neuroscience with the goal of improving clinical care it has become important to support rapid accumulation and analysis of large, heterogeneous neuroimaging samples and their metadata from both normal control and patient groups. We propose a multi-centre, multinational approach to accelerate the data mining of large samples and facilitate data-led clinical translation of neuroimaging results in stroke. Such data-driven approaches are likely to have an early impact on clinically relevant brain recovery while we simulta- neously pursue the much more challenging model-based approaches that depend on a deep understanding of the complex neural circuitry and physiological processes that support brain function and recovery. We present a brief overview of three (potentially converging) approaches to neuroimaging data warehousing and processing that aim to support these diverse methods for facilitating prediction of cognitive and behavioral recovery after stroke, or other types of brain injury or disease. Key words Neuroinformatics • Database • Stroke • Research • Data mining Corresponding Author: Tom Gee, 3560 Bathurst Street, Toronto, ON, Canada M6A 2E1 - Email: tgee@rotman-baycrest.on.ca Archives Italiennes de Biologie, 148: 207-217, 2010.