A HPC Infrastructure for Processing and Visualizing Neuro-anatomical Images Obtained by Confocal Light Sheet Microscopy Alessandro Bria ∗ , Giulio Iannello † , Paolo Soda † , Hanchuan Peng ‡ , Giovanni Erbacci § , Giuseppe Fiameni § , Giacomo Mariani § , Roberto Mucci § , Marco Rorro § , Francesco Pavone ¶ , Ludovico Silvestri ¶ , Paolo Frasconi ‖ and Roberto Cortini ‖ ∗ Department of Electrical and Information Engineering University of Cassino and Lazio Meridionale, Cassino (FR), Italy † Integrated Research Center, University Campus Bio-Medico of Rome, Italy ‡ Allen Institute for Brain Science, Seattle, WA, USA Janelia Farm Research Campus, Howard Hughes, Medical Institute, Ashburn, VA, USA § SuperComputing Applications and Innovation Department Cineca - Interuniversity Consortium, Casalecchio di Reno (BO), Italy ¶ European Laboratory for Non-Linear Spectroscopy (LENS), University of Florence, Italy ‖ Information Engineering Department, University of Florence, Italy Abstract—Scientiﬁc problems dealing with the processing of large amounts of data require efforts in the integration of proper services and applications to facilitate the research activity, interacting with high performance computing resources. Easier access to these resources have a profound impact on research in neuroscience, leading to advances in the management and pro- cessing of neuro-anatomical images. An ever increasing amount of data are constantly collected with a consequent demand of top-class computational resources to process them. In this paper, a HPC infrastructure for the management and the processing of neuro-anatomical images is presented, introducing the effort made to optimize and integrate speciﬁc applications in order to fully exploit the available resources. Keywords—HPC, Data, Neuroscience, Visualisation, Human Brain Project, Confocal Microscopy I. INTRODUCTION Contemporary science has to tackle with an ever increasing amount of data, and biological sciences make no exception. Indeed, the automatization of imaging techniques such as optical and electron microscopy is making possible to collect larger and larger image datasets, which nowadays easily ex- ceed one TeraByte each. In parallel to technical developments improving the speed and throughput of data generation, new computational paradigms are needed to cope with these large datasets, in order to discover new insights. The research ﬁeld in computational analysis of biological images has been recently named Bioimage Informatics [1], and several tools are now available to deal with important problems in bioimage analysis. However, even state-of-the-art tools cannot generally cope with images of dimensions exceeding tens of GigaBytes so novel tools are needed, especially designed to operate on TeraByte-sized datasets. A further issue that arises when dealing with very large images, is that all the processing pipelines from image acquisition to storage and retrieval have to be carefully designed and implemented to maintain resource requirements and response times within acceptable limits. In particular high performance computing techniques have to be extensively employed to meet application requirements. To respond to the increasing complexity of manipulation and processing these very large datasets, an IT infrastructure has been set up to provide data management and high performance computing capabilities. Data handled are mouse brain images obtained using CLSM (Confocal Light Sheet Microscopy) [2], a confocal ultra-microscopy technique in which selectively labelled neurons are imaged by light-sheet based microscopy [3] [4] with micron-scale resolution. Data obtained from an experiment conducted on a mouse brain (1 cubic cm at micro- metric resolution) might be of a range of 1 TeraByte, or more. Speciﬁc applications have been implemented in a toolkit at dis- posal of the scientists in order to perform: 1) fully automated 3D Stitching capability starting from acquired raw data and 2) semi-automatic extraction of some morphological characteris- tics (e.g. neurons localization) [2] [5] and 3) interactive visu- alization and annotation of images. Data and software tools, as well as elaboration algorithms are made available through a dedicated storage and computational infrastructure operated by Cineca [6], the largest Italian computing center. Data sets originated from the CLSM at the European Laboratory of Non- linear Spectroscopy LENS [7] are transferred to Cineca using high performance protocol (i.e. GridFTP) and successively 978-1-4799-5313-4/14/$31.00 ©2014 IEEE 592