THE 3-D MONTE CARLO SIMULATION OF A SEMICONDUCTOR DEVICE ON A HYPERCUBE MULTICOMPUTER * U.RANAWAKE, C.HUSTER, P.LENDERS and S.GOODNICK Department of Electrical and Computer Engineering Oregon State University, Corvallis, OR 97331 Abstract The efficient parallel implementation of a 3-D Monte Carlo device simulator is described. The parallel algorithm was implemented on a 64 node nCUBE multicomputer and its accuracy was validated by generating the static characteristics of a MESFET. Timing measurements were made to study the variation of the speedup of the parallel program as a function of the number of processors. We identify the sources of speedup loss and discuss several techniques for improving the speedup. 1. INTRODUCTION The Monte Carlo technique is a numerical method for solving the Boltzmann's transport equation that is considered more physically accurate than device analysis tools based on the drift-diffusion (DD) model. However, models based on this technique are very computationally intensive, and therefore can greatly benefit from the vast computational power of today's parallel processors. In this paper we consider the parallel implementation of a 3-D device simulator on a hypercube multicomputer. 2. THE PARALLEL MONTE CARLO ALGORITHM The flowchart of a typical Monte Carlo program for device simulation is shown in Figure 1 [1]. The parallel algorithm is an extension of the k-space Monte Carlo simulation with the addition of real space position of each simulated particle and the assignment of particle charge, using a cloud-in- cell scheme, to solve the Poisson's equation with the particle dynamics. The addition of the real space positions necessitates a geometric partitioning of the device, in which the grid is divided into three dimensional subgrids and assigned to processors using a gray code mapping. The parallel implementation of the Poisson's solution is based on an iterative method [2] that uses an odd/even ordering with Chebyshev acceleration. In order to make the communication during Monte Carlo simulation more efficient, each processor maintains a small buffer region of several layers of cells surrounding its subgrid. This region, called an external interaction region [3], is used to store the potentials of grid points owned by neighboring processors as well as to hold the particles that will move to these outer regions during the simulation of a time step. Communication between processors occurs during the solution of the Poisson's equation, the charge assignment, the contact simulation, the statistic gathering, and at the end of each time step when transferring particles Supporte d by NSF grant number ECS-8821107.