Physics of the Earth and Planetary Interiors, 59 (1990) 195—207 195 Elsevier Science Publishers By., Amsterdam — Printed in The Netherlands ConMan: vectorizing a finite element code for incompressible two-dimensional convection in the Earth’s mantle Scott D. King, Arthur Raefsky and Bradford H. Hager * 252-21 Seismological Laboratory, California Institute of Technology, Pasadena, CA 91125 (U. S. A.) (Received May 23, 1989~ accepted August 2. 1989) King, S.D., Raefsky, A. and Hager, B.H., 1990. ConMan: vectorizing a finite element code for incompressible two-dimensional convection in the Earth’s mantle. Phys. Earth Planet. Inter., 59: 195—207. We discuss some simple concepts for vectorizing scientific codes, then apply these concepts to ConMan, a finite element code for simulations of mantle convection. We demonstrate that large speed-ups, close to the theoretical limit of the machine, are possible for entire codes, not just specially constructed routines. Although our specific code uses the finite element method, the vectorizing concepts discussed are widely applicable. 1 Introduction speeds of 50—200 MFLOPS are attainable for many codes (Dongarra and Eisenstat, 1984). Many large computational projects in geo- It is often mistakenly believed that for a gen- physics are now being run on vector supercom- eral code, special tricks are needed to obtain vec- puters such as the Cray X-MP, yet after more than tor performance. However, although vectorizing a decade since the introduction of the Cray-i, compilers are becoming more sophisticated, a code most geophysicists simply compile their original that does not have data structures suitable for codes, making use of the fast clock, without taking vector operations will not perform well on a vec- full advantage of the vector hardware or achieving tor computer. We show that ConMan (Convec- anywhere near supercomputer speed. tion, Mantle), a finite element code for two-di- To illustrate, let us consider the Cray X-MP mensional, incompressible, thermal convection, with a 9.5 ns clock. It takes six clock cycles to which uses the simple concepts we present and no compute a floating point addition (7 clock cycles special tricks, runs up to 65 MFLOPS for the for a floating point multiplication). This leads to a entire code on a Cray X-MP (including i/o and theoretical peak scalar rate of 9.5 MFLOPS (mil- subroutine overhead). lion floating point operations per second). This is Understanding vectorization is becoming even about 25 times faster than a Sun 3/260 work- more important because of the recent introduction station (Dongarra, 1987). The theoretical maxi- of high-performance pipelined workstations. The mum for vector code on the Cray X-MP is 210 pipeline architecture is similar to a vector register. MFLOPS, over 20 times faster than the scalar and many of the same concepts from vector pro- code and 500 times faster than the Sun. In reality, gramming apply to obtaining the maximum per- the theoretical speeds are never reached; however. formance from a pipelined computer. In the next section we discuss the basic con- cepts of vectorization, including the concepts of Present address: Department of Earth. Atmosphenc and Planetary Sciences, Massachusetts Institute of Technology, chaining and unrolling. Although we illustrate Cambridge, MA 02139, U.S.A. vectorization with the finite element code Con- 0031-9201/90/$03.50 © 1990 Elsevier Science Publishers B.V.