Multiple-Phase Collective I/O Technique for improving data access locality David E. Singh, Florin Isaila, Alejandro Calder´ on, F´ elix Garc´ ıa and Jes ´ us Carretero Computer Science Department Carlos III University of Madrid 28911 Legan´ es, Spain {desingh,florin,acaldero,fgarcia,jcarrete}@arcos.inf.uc3m.es Abstract This paper presents Multiple-Phase Collective I/O, a novel collective I/O technique for distributed memory mul- tiprocessors. Multiple-Phase Collective I/O is a refinement of two-phase collective I/O technique. The communication phase is structured into several steps, which progressively increase the locality of the data to be written to a file sys- tem. Besides the description of Multiple-Phase Collective I/O, our paper addresses two additional objectives. First, we target to improve the efficiency of the Sulphur Transport Eurelian Model 2 (STEM-II) application. STEM-II is an air quality model that simulates transport, chemical trans- formations, emission and deposition processes in a unified framework. Due to the large amount of processed data, I/O becomes a critical factor for the application perfor- mance. Multiple-Phase Collective I/O, considerably en- hances the performance of the I/O stage in particular and, consequently, of the whole application in general. Second objective consists of evaluating and comparing the perfor- mance of Multiple-Phase Collective I/O with that of other well known parallel I/O techniques. 1 Introduction Nowadays, air pollution related to high populated or industrial areas is a topic of increasingly social interest. In particular, it is especially useful the use of simulation tools for providing feedback mechanisms that allow limit- ing the pollutant levels. STEM-II [2] is an air quality model that simulates transport, chemical transformations, emis- sion and deposition processes in an integrated framework. This model was successfully used for the control of the emissions of pollutants produced by the Endesa power plant of As Pontes (Spain). In addition, STEM-II was chosen as case of study in the European CrossGrid project, proving its relevance for the scientific community from an industrial point of view as well as its suitability for the high perfor- mance computing. In terms of application performance, STEM-II is a com- putationally intensive application that requires a multipro- cessor environment for performing simulations in a rea- sonable response time. In [8] several parallelization ap- proaches were presented proving that STEM-II can be ef- ficiently executed on a multiprocessor environment. In [3] the parallelization of the I/O stage was studied. Several I/O techniques were evaluated and compared, proving the im- portance of the I/O stage for the whole application perfor- mance. In this paper we present a novel collective I/O technique, called Multiple-Phase Collective I/O. This technique con- sists of two phases: a communication phase, in which pro- cessors exchange subdomains of their assigned data, and an output phase, in which data are transferred to disk with a high locality degree. The communication phase groups processors and communications in several stages, accord- ing to a hierarchical scheme. This approach maximizes the parallelism (allowing to overlap independent communica- tion operations), thus drastically reducing the communica- tion cost. The communication phase allows increasing data locality of output operations, enhancing their performance. The overall result is an efficient collective I/O technique. This paper is organized as follows: Section 2 describes the air quality model STEM-II, presenting its internal struc- ture as well as its parallel implementation. Section 3 gives an overview of parallel I/O techniques. Section 4 introduces our proposal. Its performance is analyzed in Section 5, where it is compared with other I/O parallel techniques. Fi- nally, Section 6 summarizes the main conclusions of this work. 2 STEM-II air quality model STEM-II is a 3D grid-based model that simulates SO x /N O x /RHC multiphase chemistry, long-range trans- port and dry plus wet acid deposition. This application is 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07) 0-7695-2784-1/07 $20.00 © 2007