Evaluation of A Performance Model of Lustre File System Tiezhu Zhao 1 , Verdi March 2,3 , Shoubin Dong 1 , Simon See 2 ,4 1 Guangdong Key Laboratory of Computer Network, South China University of Technology, Guangzhou, 510641, P.R.China 2 Asia-Pacific Science and Technology Center (APSTC), Sun Microsystems 3 Department of Computer Science, National University of Singapore 4 Department of Mechanical & Aerospace Engineering, Nanyang Technological University zhao.tiezhu@mail.scut.edu.cn, Verdi.March@Sun.COM , sbdong@scut.edu.cn , Simon.See@Sun.COM Abstract—As a large-scale global parallel file system, Lustre file system plays a key role in High Performance Computing (HPC) system, and the potential performance of such systems can be difficult to predict because the potential impact to application performance is not clearly understood. It is important to gain insights into the deliverable Lustre file system IO efficiency. In order to gain a good understanding on what and how to impact the performance of Lustre file system. This paper presents a study on performance evaluation of Lustre file systems and we propose a novel relative performance model to predict overhead under different performance factors. In our previous experiments, we discover that different performance factors have a closed correlation. In order to mining the correlations, we introduce relative performance model to predict performance differences between a pair of Lustre file system equipped with different performance factors. On average, relative model can predict bandwidth within 17%-28%. The results show our relative prediction model can obtain better prediction accuracy. Keywords-performance evaluaion; parallel file system; model; lustre I. INTRODUCTION Parallel file system is a key part of any complete massively parallel computing environment and widely used in clusters dedicating to I/O-intensive parallel applications. Lustre parallel file system is best known for powering seven of the ten largest high-performance computing (HPC) clusters in the world with tens of thousands of client systems, petabytes (PBs) of storage and hundreds of gigabytes per second (GB/sec) of I/O throughput. Many HPC sites use Lustre as a site-wide global file system, servicing dozens of clusters on an unprecedented scale [1]. Currently, in the cloud computing era, the performance research of Lustre file system increasingly attracted the attention of industry and research communities. Further details on Lustre are available in [9][10][11][12]. The rapid development of HPC application is aggressively pushing the demand of parallel file system in terms of high aggregated I/O bandwidth, mass storage capacity and high data fault-tolerant etc. HPC platforms need to be coupled with efficient parallel file systems, such as Lustre file system, that can deliver commensurate IO throughput to scientific applications. Although the various performance characteristics in HPC workloads have been researched via experimental analysis and empirical analysis, the potential performance of such systems can be difficult to predict because the potential impact to application performance in parallel file system environment is not clearly understood and most internal details of the basic components of parallel file system are not public. It is important to gain insights into the deliverable Lustre file system IO efficiency. As we known, the construction of parallel file system is much more expensive and complex. When a parallel file system is not properly tuned or configured, this cost may not be paid off. So, issues on how to optimize the design of a parallel file system, how to evaluate the performance of a parallel file system, how to tune the performance of a parallel file system and how to predict the performance trend are more and more concerned by both storage industry and research communities. Storage system can be complex to manage. Management consists of storage device (volume or LUN) sizing, selecting a RAID level for each device, and mapping application workloads to the storage device. Automating management is one way to offer the administrator some relief and help reduce the total cost of ownership in the data center. In particular, one could automate the mapping of workloads to storage devices. Whereas, storage administration currently continue to be overly complex and costly, and face with many challenges involved in deciding the mapping from application data sets to storage devices, balancing loads, matching workload characteristics to device strengths. Unfortunately, storage administration currently relies on experts who use rules-of-thumb to make decision [18][19][20]. One need a mechanism for predicting the performance of any given workload and automates the prediction process. Based on the above observations, we propose an in-depth performance evaluation of Lustre file system and our evaluation mainly covers the number of OSSes, storage connection approaches, the type of disks, the type of journal for OST and the number of threads/OST etc.. We deliver relational analysis for the performance overhead under different performance factors and discover that different performance factors have a closed correlation. In order to mining the potential performance correlations, we conduct a novel relative performance prediction model to predict performance under different factors. The remainder of this paper is organized as follows. We firstly introduce related work in section II. Then, we conduct an in-depth survey on performance factor of Lustre file system in section III. In section IV, we present the relative performance prediction model and carry out detailed prediction performance analysis and conclude the paper in section V. The Fifth Annual ChinaGrid Conference 978-0-7695-4106-8/10 $26.00 © 2010 IEEE DOI 10.1109/ChinaGrid.2010.38 191