MIGRATING FUNCTIONALITY FROM ROMS TO EMBEDDED MULTIPLIERS Gareth W. Morris, George A. Constantinides, and Peter Y.K. Cheung Imperial College, Circuits and Systems Group Department of Electronic and Electrical Engineering Exhibition Road, London, SW7 2BT ABSTRACT This poster proposes a technique, based on polynomial ap- proximation, which can be applied to convert ROMs into a combination of arithmetic operations and smaller ROMs. We show that this technique highlights new areas of the multiplier/4LUT design space over existing methods. 1. BACKGROUND Contemporary FPGA architectures include embedded RAM blocks and multipliers. By using the approach suggested it is possible to convert from ROMs and LUTs into embedded multipliers and a small number of LUTs. The conversion uses a variation on polynomial approximation. As a result, resources in FPGAs become more fluid and different pos- sibilities for synthesising designs become available; for in- stance, allowing a high level synthesis system to target the number of embedded MULTs or RAM blocks used. A major drawback of a simple polynomial approxima- tion method is the worst case exponential approximation or- der. In order to mitigate this problem for common func- tions, uniform piecewise polynomial approximation [1], or bi-/multi-partite table method [2] have often been used in the past. 2. PROPOSED APPROACH The proposed approach uses different coefficients depend- ing on the address bus x, and these coefficients exist over independent ranges. This is achieved with the architecture in figure 1, with implications in the function domain shown in figure 2. Instead of single hardwired coefficients, a look up table containing many possible coefficient values is used. These coefficients are selected via an address bus input de- rived by a defined masking of the main address bus; requir- ing zero hardware overhead in an FPGA implementation. Simple and uniform piecewise polynomial approximations are subsets of this technique, for certain maskings. The coefficient values are calculated via linear program solution, which allows each coefficient to be treated sepa- rately. For example, a set of linear equations representing polynomial evaluation x c 0 c 1 c n address bus ŷ LUT LUT LUT arbitrary bit mask mod 2 k Fig. 1. Architecture of the proposed technique. LUT with mask 000 ŷ x c 0 LUT with mask 110 LUT with mask 100 c 1 d 1 e 1 f 1 c 2 d 2 Fig. 2. Example showing implications of the proposed tech- nique in the function domain. the design in figure 2, having bit masks of 000, 110 and 100 for each coefficient LUT respectively, has 4 coefficients in the c 1 table and 2 in the c 2 table. As well as allowing different values for each coefficient used in the polynomial evaluation, we make our architecture more general by applying a modulus of value 2 k to x before evaluating the polynomial. This modulus reduces the dy- namic range, and therefore precision required, by the poly- nomial coefficients. To incorporate bi- and multi-partite table methods into our design space we consider k =0 as a special case. Nor- mally in this case the x input to the evaluation would be constant 0; our modification is to take constant 1. The eval- uation then reduces to a summation of the coefficient lookup table outputs. Bi- and multi- partite table methods are now a subset of the approach where the bit masks for each lookup table fit the relevant definition. Bi- and multi-partite table Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’04) 0-7695-2230-0/04 $ 20.00 IEEE