INNOVATIVE APPROACHES TO PARALLELIZING FINITE-DIFFERENCE TIME-DOMAIN COMPUTATIONS Dmitry A. Gorodetsky and Philip A. Wilsey University of Cincinnati, P. O. Box 210030 Cincinnati, Ohio 45221-0030 USA Email: goroded@email.uc.edu INTRODUCTION With increasing frequencies and reduced grid sizes numerical schemes such as FDTD take an excessive amount of time to complete their computations [1]. Numerous efforts that have been put forth to speed up the execution of this algorithm by utilizing parallel processors [2-6]. The conventional approach [7] has been to partition the FDTD domain into equal-sized sub-domains whose boundaries are Yee cells, as shown in Figure 1. Due to the nature of FDTD, in order to calculate the E field values on the boundary, the outer H field values have to be transmitted by both sides. This results in idle time in case one processor is faster than its neighbor and has to wait for the message. The partitioning can be done in 1, 2, or 3 dimensions. Every side that is on a boundary will have to communicate with the surrounding processor. The smaller the size of the partition, the faster will be the increase in the surface to volume ratio (S/V). This is a caveat of the parallel processing approach because the communication to computation ratio is proportional to the surface to volume ratio. At some point, this ratio becomes so large that further partitioning does not result in any more gain. Fixed speed-up measures how much faster the algorithm runs as it is divided among processors, while the problem size remains the same. It is written as [1]: S f (P)=(s+p)/[s+w(P)+p/P] (1) where P is the number of processors, s+p is the total time to run the entire algorithm on just one processor, split up into the respective serial and parallel portions, and w(P) is the additional work required to make the algorithm run on P processors. It is clear that the greater the amount of serialism, the worse the performance of the fixed speed- up. Another useful metric is the parallel efficiency. It measures the percentage utilization of the available parallelism and is simply the speed-up S(P) divided by the number of processors, P. IMPLEMENTATION A sequential version of FDTD has been implemented by the research group at the University of Minnesota [8]. This code simulates an electromagnetic wave propagating in a waveguide. This is accomplished by a mesh with dimensions nx x ny x nz = 133x20x9 unit cells 3 and the solution is performed for 1000 time steps. We converted this code to run in parallel and implemented it on a Beowulf parallel