Modelling wave propagation on parallel computers Patricia Gauzellino *,1 , Fabio Zyserman 2 , & Juan Santos 1,3 1 Departamento de Geof´ ısica Aplicada, Universidad Nacional de La Plata, Argentina 2 CONICET, Departamento de F´ ısica, Universidad Nacional de La Plata, Argentina 3 CONICET, Department of Mathematics, Purdue University Summary We present a methodology to model 2D and 3D wave propagation in the subsurface of the earth, consisting of a sequence of techniques leading to an algorithm which is specially powerful on parallel computers. Introduction In order to model 2D and 3D wave propagation effi- ciently, it is necessary to deal with large amounts of data, parameters and variables. Although modern computers have become more and more powerful, the requirement of storage and computing capacity of these problems challenge us to design algorithms specific for parallel computers. We did not want to parallelize a previously existent serial code, but we wanted to create algorithms with parallelism in their very core because we deem the latter to perform better than the former. Therefore, the algorithms design must be radically changed. We propose to use the Domain Decomposition (DD) technique, in such a way as to divide the original problem into a large number of smaller ones. The goal is clearly that each one of the processors in a parallel computer solves a set of those smaller problems. Besides, we choose to state our problems in the fre- quency domain, and integrate Fourier over the necessary frequencies (again in a parallel way) to get the response in the time domain when required (Douglas et al., 1993). As a second step we discretize every subproblem using confoming and non-conforming Finite Elements (FE), leaving the justification of this choice for the next section. We show the solutions yielded by the proposed method- ology using examples of seismic waves and a particular case of electromagnetic waves. Moreover, we analyze the quality of the parallel codes using measures such as speedup and scalability. Methodology As mentioned above, the first step of our task consists of dividing the original domain into a multiplicity of subdomains, stating the model equation in each of them. The division may be accomplished using overlapping or non-overlapping domains. We choose the second option because it avoids the increase of calculations, although it can be more difficult to implement and computationally more expensive (Kim, 1995). Naturally, we introduce consistency conditions on the artificial boundaries with which the equivalence of this new problem with the ’global’ one is guaranteed. We employ absorbing boundary conditions (ABC) on the external (or computational) borders -so as to diminish the size of the model- and rewrite for simplicity the aforementioned consistency conditions on the borders among elements as ABCs (Douglas et al., 1995; Douglas et al., 1997; Santos and Sheen, 1998). In order to get the discrete version of the problem the iterative DD technique is combined with Finite Elements procedures (FE). There are reasons to state that non-conforming FE have some advantages compared to conforming or traditional FE; among them we can mention a) Douglas et al.(2000; 2001) have proved the convergence of the iterative algorithm when using non- conforming FE. b) The number of unknowns is smaller, up to one half (Gauzellino, 1999). c)Important contrast between physical properties of the model is naturally taken into account, avoiding the use of complicated numerical artifacts (Zyserman and Santos, 1999). In this stage we hybridize (Carey and Oden, 1983) the procedure, which simplyfies the algebraic problem but introduces new variables. When the whole procedure is completed we get a large number of very small linear systems that can be solved using any standard algorithm. Parallel implementation One of the traits of the DD is to reduce the amount of information to be interchanged among processors, since only the variables lying in domains adjacent to the virtual boundaries among nodes must be transferred. In order to avoid discrepancies in the measurement of codes’ performance on parallel computers some defini- tions must be established (Foster, 1995). Let Ts(n) the time period required by the fastest known serial code to solve a problem of size n -in our case this can be the number of variables involved- on a unique processor of a parallel machine. If Tj (n, p) is the time that the j -th processor of p nodes running the parallel code needs to complete its task for the same problem size n, we define the parallel execution time by Tπ(n, p)= 1 p p j=1 Tj (n, p) (1) One measurement of the performance of the parallel code is given by the speedup, defined as the quotient between