A GPU-based Architecture for Real-Time Data Assessment at Synchrotron Experiments Suren Chilingaryan, Alessandro Mirone, Andrew Hammersley, Claudio Ferrero, Lukas Helfen, Andreas Kopmann, Tomy dos Santos Rolo, Patrik Vagovic Abstract—Advances in digital detector technology leads presently to rapidly increasing data rates in imaging experiments. Using fast two-dimensional detectors in computed tomography, the data acquisition can be much faster than the reconstruction if no adequate measures are taken, especially when a high photon flux at synchrotron sources is used. We have optimized the reconstruction software employed at the micro-tomography beamlines of our synchrotron facilities to use the computational power of modern graphic cards. The main paradigm of our approach is the full utilization of all system resources. We use a pipelined architecture, where the GPUs are used as compute coprocessors to reconstruct slices, while the CPUs are preparing the next ones. Special attention is devoted to minimize data transfers between the host and GPU memory and to execute memory transfers in parallel with the computations. We were able to reduce the reconstruction time by a factor 30 and process a typical data set of 20 GB in 40 seconds. The time needed for the first evaluation of the reconstructed sample is reduced significantly and quasi real-time visualization is now possible. Index Terms—Synchrotrons, Computed tomography, Image reconstruction, Software, High performance computing, Parallel programming, GPU computing, Performance evaluation. I. I NTRODUCTION D RIVEN by substantial advances in digital detector tech- nology, there is presently a rapid progress in X-ray imaging technologies opening many applications in the fields of medical diagnostics, homeland security, non-destructive testing, materials research and others. X-ray imaging permits spatially resolved visualization of 2D and 3D structures in materials and organisms which is crucial for the understanding of their properties. Furthermore, it allows one to recognize defects in devices from the macro- down to the nano-scale. Additional resolution in the time domain gives insight into the dynamics of processes allowing one to understand the functionality of organisms and to optimize devices and tech- nological processes. In recent years, synchrotron tomography has seen a sub- stantial decrease of scan durations [1]. Based on the available photon flux densities at modern synchrotron sources, ultra-fast X-ray imaging enables the investigation of the dynamics of technological and biological processes with a time scale down Manuscript received June 30, 2010; revised November 19, 2010; revised February 22, 2011 S. Chilingaryan and A. Kopmann are with Institute for Data Processing and Electronics, Karlsruhe Institute of Technology, Karlsruhe, Germany (telephone: +49 724 7826579, e-mail: Suren.Chilingaryan@kit.edu). A. Mirone, A. Hammersley, and C. Ferrero are with European Synchrotron Radiation Facility, Grenoble, France. L. Helfen, T. dos Santos Rolo, and P. Vagovic are with Institute for Syn- chrotron Radiation, Karlsruhe Institute of Technology, Karlsruhe, Germany. to the milliseconds range in 3D. Using modern CMOS-based pixel cameras, it is possible to reach image rates of up to several thousand frames per second. For example, frame rates of 5000 images per second were achieved [2] using a filtered white beam from the ESRF (European Synchrotron Radiation Facility) ID19 wiggler source, a frame rate of 40000 images per second were reported [3] using different experimental conditions at ESRF beamline ID15a and a larger effective pixel size. As a result of the improved image acquisition, a given experiment can produce data sets of multiple gigabytes in a few seconds. It is a major challenge to process the data in a reasonable amount of time facilitating on-line reconstruction. Several approaches are currently used to handle the huge data sets produced at the synchrotron imaging beamlines. • At the TopoTomo beamline of ANKA (the synchrotron facility at the Karlsruhe Institute of Technology [4]) the data has been stored in a local memory, transferred to mass storage, and then processed and analyzed off-line. The data quality and thus the success of the experiment could only be judged with a substantial delay, which made an immediate monitoring of the results impossible. • At the ESRF, the experiments are usually monitored by distributing the reconstruction of 3D volume onto different hosts in a cluster via a queuing system. This approach was adopted to maximize overall throughput rather than minimizing reconstruction time for a single scan. • A pipelined data acquisition system combining a fast detector system, high speed data networks, and massively parallel computers is employed at the APS (Advanced Photon Source at the Argonne National Laboratory) to acquire and reconstruct a full tomogram in tens of min- utes [1]. At the Paul Scherrer Institute in Switzerland, this approach was further improved, reducing the reconstruc- tion time down to just a few minutes [5]. However, the supercomputer-based processing is expensive in terms of money, power consumption, and administrative effort. Our approach exploits the computational power of modern graphic adapters which include hundreds of simple processors to transform vertices in 3D space. These processors can be used to speed up the reconstruction. The peak performance of the fastest GPUs exceeds 1 TFlop/s for single precision operations. High-end gaming desktops include up to four cards and provide almost 5 TFlop/s of computational power. As compared to 100 GFlop/s provided by commonly used servers, this gives a potential speedup of a factor 50 [6], [7].