Physics of the Earth and Planetary Interiors, 59 (1990) 195—207 195
Elsevier Science Publishers By., Amsterdam — Printed in The Netherlands
ConMan: vectorizing a finite element code for incompressible
two-dimensional convection in the Earth’s mantle
Scott D. King, Arthur Raefsky and Bradford H. Hager *
252-21 Seismological Laboratory, California Institute of Technology, Pasadena, CA 91125 (U. S. A.)
(Received May 23, 1989~ accepted August 2. 1989)
King, S.D., Raefsky, A. and Hager, B.H., 1990. ConMan: vectorizing a finite element code for incompressible
two-dimensional convection in the Earth’s mantle. Phys. Earth Planet. Inter., 59: 195—207.
We discuss some simple concepts for vectorizing scientific codes, then apply these concepts to ConMan, a finite
element code for simulations of mantle convection. We demonstrate that large speed-ups, close to the theoretical limit
of the machine, are possible for entire codes, not just specially constructed routines. Although our specific code uses the
finite element method, the vectorizing concepts discussed are widely applicable.
1 Introduction speeds of 50—200 MFLOPS are attainable for
many codes (Dongarra and Eisenstat, 1984).
Many large computational projects in geo- It is often mistakenly believed that for a gen-
physics are now being run on vector supercom- eral code, special tricks are needed to obtain vec-
puters such as the Cray X-MP, yet after more than tor performance. However, although vectorizing
a decade since the introduction of the Cray-i, compilers are becoming more sophisticated, a code
most geophysicists simply compile their original that does not have data structures suitable for
codes, making use of the fast clock, without taking vector operations will not perform well on a vec-
full advantage of the vector hardware or achieving tor computer. We show that ConMan (Convec-
anywhere near supercomputer speed. tion, Mantle), a finite element code for two-di-
To illustrate, let us consider the Cray X-MP mensional, incompressible, thermal convection,
with a 9.5 ns clock. It takes six clock cycles to which uses the simple concepts we present and no
compute a floating point addition (7 clock cycles special tricks, runs up to 65 MFLOPS for the
for a floating point multiplication). This leads to a entire code on a Cray X-MP (including i/o and
theoretical peak scalar rate of 9.5 MFLOPS (mil- subroutine overhead).
lion floating point operations per second). This is Understanding vectorization is becoming even
about 25 times faster than a Sun 3/260 work- more important because of the recent introduction
station (Dongarra, 1987). The theoretical maxi- of high-performance pipelined workstations. The
mum for vector code on the Cray X-MP is 210 pipeline architecture is similar to a vector register.
MFLOPS, over 20 times faster than the scalar and many of the same concepts from vector pro-
code and 500 times faster than the Sun. In reality, gramming apply to obtaining the maximum per-
the theoretical speeds are never reached; however. formance from a pipelined computer.
In the next section we discuss the basic con-
cepts of vectorization, including the concepts of
Present address: Department of Earth. Atmosphenc and
Planetary Sciences, Massachusetts Institute of Technology, chaining and unrolling. Although we illustrate
Cambridge, MA 02139, U.S.A. vectorization with the finite element code Con-
0031-9201/90/$03.50 © 1990 Elsevier Science Publishers B.V.