Reducing Communication Overhead in Large Eddy Simulation of Jet Engine Noise Yingchong Situ ∗ Lixia Liu ∗ Chandra S. Martha † Matthew E. Louis † Zhiyuan Li ∗ Ahmed H. Sameh ∗ Gregory A. Blaisdell † Anastasios S. Lyrintzis † ∗ Department of Computer Science and † School of Aeronautics and Astronautics Purdue University West Lafayette, United States {ysitu, liulixia, cmartha, louism, zhiyuanli, sameh, blaisdel, lyrintzi}@purdue.edu Abstract—Computational aeroacoustics (CAA) has emerged as a tool to complement theoretical and experimental ap- proaches for robust and accurate prediction of sound levels from aircraft airframes and engines. CAA, unlike computational fluid dynamics (CFD), involves the accurate prediction of small- amplitude acoustic fluctuations and their correct propagation to the far field. In that respect, CAA poses significant challenges for researchers because the computational scheme should have high accuracy, good spectral resolution, and low dispersion and diffusion errors. A high-order compact finite difference scheme, which is implicit in space, can be used for such simu- lations because it fulfills the requirements for CAA. Usually, this method is parallelized using a transposition scheme; however, that approach has a high communication overhead. In this paper, we discuss the use of a parallel tridiagonal linear system solver based on the truncated SPIKE algorithm for reducing the communication overhead in our large eddy simulations. We report experimental results collected on two parallel computing platforms. Keywords-finite difference methods; iterative solution tech- niques; linear systems; numerical algorithms; parallel algo- rithms I. I NTRODUCTION Computational aeroacoustics (CAA) has emerged as a relatively new discipline and a robust and accurate tool that complements traditional theoretical and experimental approaches in the prediction of sound levels from aircraft airframes and engines. CAA, unlike the related discipline of computational fluid dynamics (CFD), involves the accurate prediction of small-amplitude acoustic fluctuations and their correct propagation to the far field. In that respect, CAA poses significant challenges for researchers because the com- putational scheme should have high accuracy, good spectral resolution, and low dispersion and diffusion errors. The state of the art of CAA prediction of far-field noise is based on time-dependent simulation of noise-generating turbulent flows coupled with integral methods for propagating the noise to the observer location. The highest level of simula- tion, based on the Navier-Stokes equations, is direct numerical simulation (DNS), in which time-dependent motions of all revelant length scales are resolved directly without using any turbulence model. While theoretically DNS can deliver the best accuracy in numerical results among mainstream methodologies for numerical CAA simulation, it suffers from the major drawback that its computational cost is infeasible for turbulent flows of Reynolds numbers of practical engineering interest. At the opposite extreme in simulation philosophy to DNS, Reynolds-averaged Navier-Stokes (RANS) equations model the full range of time-dependent motions of all length scales using turbulence models. In comparison with DNS, RANS significantly reduces the computational cost, but only at the expense of the flow physics. Large eddy simulation (LES) strikes a balance and is a compromise between DNS and RANS; it directly resolves eddies larger than the grid scale and captures the effect of small eddies using a subgrid-scale model. Such a methodology allows LES to use, in comparison with DNS, a coarser grid that is fine-grained just enough to resolve large eddies, maintaining the feasibility of simulating turbulent flows at high Reynolds numbers; meanwhile, the subgrid-scale model ensures that the influences of small eddies are retained even though they are not directly simulated. Lying at the core of numerical methods for three- dimensional LES are spatial differentiation of flow variables, and spatial filtering for suppressing numerical artifacts. Both of these operations involve solving tridiagonal linear sys- tems along the three axis directions of the computational space. In our previous efforts in developing code for three- dimensional LES, we used a transposition scheme [1] where the computational space is transposed as necessary so that all data for each individual system are available to a single processor. This allowed us to utilize the tridiagonal linear system solver in LAPACK [2] to attain high accuracy as well as high efficiency and achieve almost perfect scalability in our previous performance experiments. Unfortunately, as computing platforms evolve, and the gap between processor speed and interconnection network bandwidth further widens, in our most recent experiments, the high communication overhead inherent to the transposition scheme exerted significant impact on its parallel performance and severely limited its efficiency. Also, the transposition scheme limits the number of processors used to be no more than the number of planes in a given direction, when a one-dimensional partitioning is done. This prompted us to investigate alternatives to the transposition scheme. Among a multitude of possible choices, we choose to get rid of transposition of the computational space by employing a