An FPGA Based Reconfigurable Coprocessor Board Utilizing a Mathematics of Arrays H. Pottinger, W. Eatherton, J. Kelly, T. Schiefelbein Department of Electrical Engineering L. R. Mullin, R. Ziegler Department of Computer Science University of Missouri - Rolla Abstract -- Work in progress at the University of Mis- souri-Rolla on hardware assists for high performance computing is presented. This research consists of a novel field programmable gate array (FPGA) based reconfigurable coprocessor board (the Chameleon Coprocessor) being used to evaluate hardware architec- tures for speedup of array computation algorithms. These algorithms are developed using a Mathematics of Arrays (MOA). They provide a means to generate ad- dresses for data transfers that require less data move- ment than more traditional algorithms. In this manner, the address generation algorithms are acting as an in- telligent data prefetching mechanism or special purpose cache controller. Software implementations have been used to provide speedups on the order of 100% over classical methods to the solution of heat transfer equa- tions on a uniprocessor. We extend these methods to application designs for the Chameleon Coprocessor. 1. Introduction The coprocessor architecture presented in this paper is based upon the concept of using the FPGAs as re- progammable and intelligent cache controller in place of the general purpose cache controllers found in current mi- croprocessors. The argument for the need of a special purpose cache controllers for array processing is that for large multidimensional arrays the cache is not utilized efficiently at all and has frequent misses. Additionally for most workstations with virtual memory references, the Translation Lookaside Buffer (TLB) can only accommo- date several hundred KB of data [1]. Therefore TLB misses and cache misses will result in more time being spent on address generation and memory access than on the actual operation being performed on the array. One such example, a MC88100 RISC processor required 9 in- structions to compute an address and only 3 instructions to compute and assign the data for that address. Each in- struction in the loop took one clock cycle to complete. In this case three times longer was spent on address genera- tion than the computation. By implementing the generation of addresses in hardware, not only can methods of optimizing the main memory address patterns be explored, but the size of the cache being used can be taken into account Classical array accessing provides unnecessary com- putational overhead. Use of MOA for hardware or software algorithms means that the address generation overhead of array referencing is reduced and array access is speeded. MOA provides a formal way of describing array opera- tions. Generally, these expressions are at a high level and contain cartesian referencing. These expressions can be reduced to a normal form that only contains the informa- tion needed to generate a linear access pattern for the array in physical memory. These patterns are quickly comput- able since they contain only additions and multiplies and they are fast because they access the array minimally to carry out the operation at hand. 2. An Introduction to MOA A Mathematics of Arrays can be used to describe mathematical array operations regardless of their shape, size, or dimensionality. MOA describes an array calculus containing a set of operator definitions, shape definitions, and reduction rules all based on a single indexing opera- tor, ψ. For this reason, MOA is often referred to as the Psi Calculus. Algebraic operators are included in the Psi Cal- culus to form a broad set of operators needed to describe complex array operations. All the operators are extended for scalars, vectors, and multi-dimensional arrays. MOA is defined in [2] and is based on Abrams' work on the simplification of array expressions [3]. The advan- tages of expression reductions and the correspondence between cartesian and linear referenced arrays are de- scribed in [4]. Table I lists some of the more useful Psi Calculus op- erators together with an example usage on the array, ξ e 2 1 2 3 4 5 6