GridFTP based real-time data movement architecture for x-ray photon correlation spectroscopy at the Advanced Photon Source S. Narayanan * , T.P. Madden, A.R. Sandy Advanced Photon Source Argonne National Laboratory Argonne, IL, USA * sureshn@aps.anl.gov Raj Kettimuthu * , Michael Link Mathematics and Computer Science Argonne National Laboratory Argonne, IL, USA * kettimut@mcs.anl.gov Abstract—X-ray photon correlation spectroscopy (XPCS) is a unique tool to study the dynamical properties in a wide range of materials over a wide spatial and temporal range. XPCS measures the correlated changes in the speckle pattern, produced when a coherent x-ray beam is scattered from a disordered sample, over a time series of area detector images. The technique rides on “Big Data” and relies heavily on high performance computing (HPC) techniques. In this paper, we propose a high- speed data movement architecture for moving data within the Advanced Photon Source (APS) as well as between APS and the users’ institutions. We describe the challenges involved in the internal data movement and a GridFTP-based solution that enables more efficient usage of the APS beam time. The implementation of GridFTP plugin as part of the data acquisition system at the Advanced Photon Source for real time data transfer to the HPC system for data analysis is discussed. Index Terms—Big Data, GridFTP, High Performance Computing, X-ray Photon Correlation Spectroscopy, Synchrotron. I. INTRODUCTION X-ray photon correlation spectroscopy (XPCS) is a powerful technique to probe the dynamics in materials over a wide range of length scales (micrometers-nanometers) and time scales (microseconds-hours) [1]. The dynamical phenomena that have been successfully studied using XPCS encompass the classical Brownian diffusion [2] in simple liquids to more complex hyper and sub-diffusive processes that have been observed in a host of complex fluids such as gels, emulsions, polymers [3]. One of the challenging areas of study that is of general interest and is being studied using XPCS is to measure the dynamical properties of concentrated eye-lens suspensions in order to help in understanding the effect of the changes in proteins on diseases such as presbyopia [4]. XPCS technique by functionality involves handling “Big Data” streaming at high data rates pushing the envelope of network bandwidth, disk writing and access speeds and high performance computing (HPC) for data analysis. The dynamical time scales that can be probed is limited at one end by how fast the detector can stream images and at the other end by the total number of images that are collected in a single data acquisition. The state-of-the-art CCD detector that is suitable for XPCS operates continuously at 60 fps (frames per second), streaming one million (1M) pixels producing 120 MB/sec of data [5]. New detectors that are suitable for XPCS and will be available in 2013 push the data rates significantly further by streaming 1M pixels at 200 fps and 22000 fps. This enables measuring the dynamics in samples at much shorter time scales, which are of relevance in understanding physiology in real conditions like cell membranes and eye lens [4]. The notion behind science in the fast track is to be able to acquire data and analyze in real time so that the experimenter is provided with a real time feedback on the physical parameters being measured. The experimenter benefits by being able to make quick changes to the experimental conditions for optimal applications. For data analysis, we apply HPC tools established at the APS using several compute nodes operating under a Lustre parallel file system. Often times, moving the data from the acquisition system to HPC cluster is the bottleneck. In this paper, we propose a high-speed data movement architecture for moving data within the Advanced Photon Source (APS) as well as between APS and the users’ institutions. We describe the development and deployment of a GridFTP-based solution for high throughput and high reliability real time data transfer from the data acquisition PCs to the HPC cluster. The paper describes the impact on data analysis by the progress made in the pipeline for data transfer from a traditional UNIX copy to an offline GridFTP to a real time data transfer by incorporating GridFTP with the data acquisition system. The rest of the paper is organized as follows. In Section II, we provide background on the science done using XPCS. In Section III, we discuss the challenges in data movement faced at the XPCS beamline at APS. In Section IV, we provide background on GridFTP. We present a data movement architecture; describe the development and deployment of a GridFTP-based solution and its impact on beam time usage in Section V. In Section VI, we provide a brief description of the software package Experimental Physics and Industrial Control System (EPICS) that is used to control experiments and acquire data at XPCS. We present the design of GridFTP plugin for EPICS in Section VII. In Section VIII, we provide some results obtained using the plugin, future outlook and