Boosting Application-specific Parallel I/O Optimization using IOSIG Yanlong Yin 1 yyin2@iit.edu Surendra Byna 2 sbyna@lbl.gov Huaiming Song 1 hsong20@iit.edu Xian-He Sun 1 sun@iit.edu Rajeev Thakur 3 thakur@mcs.anl.gov 1 Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois 2 Computational Research Division, Lawrence Berkeley National Lab, Berkeley, California 3 Mathematics and Computer Science Division, Argonne National Lab, Argonne, Illinois Abstract—Many scientific applications spend a significant portion of their execution time in accessing data from files. Various optimization techniques exist to improve data access performance, such as data prefetching and data layout optimization. However, optimization process is usually a difficult task due to the complexity involved in understanding I/O behavior. Tools that can help simplify the optimization process have a significant importance. In this paper, we introduce a tool, called IOSIG, for providing a better understanding of parallel I/O accesses and information to be used for optimization techniques. The tool enables tracing parallel I/O calls of an application and analyzing the collected information to provide a clear understanding of I/O behavior of the application. We show that performance overheads of the tool in trace collection and analysis are negligible. The analysis step creates I/O signatures that various optimizations can use for improving I/O performance. I/O signatures are compact, easy-to-understand, and parameterized representations containing data access pattern information such as size, strides between consecutive accesses, repetition, timing, etc. The signatures include local I/O behavior for each process and global behavior for an overall application. We illustrate the usage of the IOSIG tool in data prefetching and data layout optimizations. Keywords-Parallel I/O, I/O characterization, data access pattern, I/O optimization I. INTRODUCTION As high performance computing (HPC) is moving towards exa-scale, efficient usage of resources in the large- scale machines is a critical requirement. Efficient usage typically translates to faster scientific discovery and to lower energy consumption. Improving data access performance plays a significant role in making parallel computers efficient. Since many scientific applications deal with large amounts of data, making parallel file I/O efficient has an enormous impact on making parallel applications execute faster. Typically, execution time of a parallel program includes the time spent on computation, communication among processes, and data I/O. In many data intensive applications, I/O performance is usually a significant bottleneck leading to wastage of CPU cycles and the corresponding wasted energy consumption. In HPC systems, the gap between computing capacity and I/O performance keeps increasing because of highly diverse growth rates of storage devices and processors. As the number of processing cores in large-scale clusters increase, the insatiable desire for accessing more data continues to grow. Hence, improving data access performance is the key for improving efficiency of HPC applications at exa-scale. The first step towards efficient data accesses is to understand their behavior. A few tools exist for profiling communication and computation overheads in parallel applications [1] [2] [3] [4]. However, there is a serious lack of tools for analyzing parallel I/O performance in a comprehensive manner and for converting the analyzed data into information that optimization techniques can use. The existing I/O analysis tools [5] [6] [7] [8] [9] have limited scope of I/O characterization. Few of these tools [5] [8] collect a lot of trace information about I/O calls and leave it for programmers to understand. These tools do not provide the much needed analysis step to gain a clear insight into I/O characteristics. Without the analysis step, although some I/O traces are available, they just sit idle in some server and are not useful for improving the efficiency. A few other tools [6] [7][9] provide partial understanding of I/O behavior but also require programmer involvement in performing optimizations. The latter category of tools aims towards reducing overhead and resource requirement in collecting information about I/O calls by retrieving few details and infrequently. While they achieve the low resource usage goal, they can only provide little insight into I/O behavior. We aim to develop an I/O characterization tool, which gives comprehensive understanding of the I/O behavior of parallel applications and paves a path towards automatic optimization of data access. MPI-IO and parallel file systems are widely adopted in HPC systems to reduce the negative impact of the I/O gap as well as for ease of use. While MPI- IO and file systems bring I/O performance to an acceptable level, there is a significant scope for optimizing overall performance of parallel I/O. Many optimization strategies have been proposed for data read, such as data prefetching, two-phase collective I/O, data sieving and data requests scheduling and for data placement and organization, data replication, and data distribution. Most existing I/O optimizations can benefit from knowing I/O behavior of an application. In many occasions, making the optimal design of performance improvements or choosing optimal system configuration for performance tuning requires application-specific information. For example, in a data prefetching enabled system, untimely or useless prefetching happens from time to time, which harms I/O performance. Knowing the application’s data access pattern, the prefetcher can avoid untimely and useless prefetching. Section II describes more details on this example. Noticing the widespread demand for retrieving parallel I/O access patterns of applications, we developed IOSIG tool that helps users to understand the I/O characteristics of their