CONCURRENCY PRACTICE zyxwvu AND EXPERIENCE, VOL. zyxwv 1(1), 63-103 (SEPTEMBER 1989) Parallel computing comes of age: supercomputer level parallel computations at Caltech zyx GEOFFREY zyxwvutsrqpo C. FOX Caltech Concurrent zyxwvutsr Computntion Program Mail Code 20649 Pmadena. CaliforniO 91125, USA SUMMARY Parallel supercomputers are now in regular use at Caltech for several major scientific calculations. We use this experience to abstract a set of lessons for applications, decomposition, performance, hardware and software. We consider hypercubes, transputer arrays and the SIMD Connection Machine CM-2 and AMT DAP. These are contrasted, where possible, with CRAY and other high performance conventional computers. Applications covered are lattice gauge theory, plasma physics, statistical and condensed matter physics, astronomical data analysis, quantum chemistry, graphics ray tracing, string dynamics, grain dynamics, astrophysical particle dynamics, computer chess and Kalman filters. 1. INTRODUCTION The heart of this article is a review of twelve applications which have used parallel computers recently and achieved supercomputer level performance. We have operationally defined this as being equivalent to calculations involving a few hundred or more CRAY zyxwvut X-MP hours. This review is contained in Sections 3-14 and is followed by a set of lessons that we have learned from these and other experiences using concurrent computers at Caltech. It is hoped that our experience will be useful to designers of hardware and software for parallel computers, as well as potential users of such systems. In previous reviews[l4], we have described the broad range of applications and algorithms that successfullyparallelize. However, one of the most exciting developments of the last year is the emergence of several parallel machines that have comparable or greater performance than a CRAY. The hardware used in the applications of Sections 3- 14 is detailed in Table 1. There are other interesting machines, but Table 1 represents those available at Caltech. The performance evaluation of [5], also to be published in Concurrency: Practice and Experience, has implemented some of our applications on a broader range of machines. We will refer to this paper when appropriate. In Section 2, we describe data parallelism or domain decomposition as a universal source of parallelism. We introduce a classification of problems by their temporal or computational structure[3], which will be helpful in succeeding sections in understanding which zyxwvuts types of problems perform well on which computer architectures. 1040-3108/89/010063-41$20.50 01989 by John Wiley & Sons, Ltd. Received June I989