MIGRATING FUNCTIONALITY FROM ROMS TO EMBEDDED MULTIPLIERS Gareth W. Morris, George A. Constantinides, and Peter Y.K. Cheung Imperial College, Circuits and Systems Group Department of Electronic and Electrical Engineering Exhibition Road, London, SW7 2BT ABSTRACT This poster proposes a technique, based on polynomial ap- proximation, which can be applied to convert ROMs into a combination of arithmetic operations and smaller ROMs. We show that this technique highlights new areas of the multiplier/4LUT design space over existing methods. 1. BACKGROUND Contemporary FPGA architectures include embedded RAM blocks and multipliers. By using the approach suggested it is possible to convert from ROMs and LUTs into embedded multipliers and a small number of LUTs. The conversion uses a variation on polynomial approximation. As a result, resources in FPGAs become more ﬂuid and different pos- sibilities for synthesising designs become available; for in- stance, allowing a high level synthesis system to target the number of embedded MULTs or RAM blocks used. A major drawback of a simple polynomial approxima- tion method is the worst case exponential approximation or- der. In order to mitigate this problem for common func- tions, uniform piecewise polynomial approximation [1], or bi-/multi-partite table method [2] have often been used in the past. 2. PROPOSED APPROACH The proposed approach uses different coefﬁcients depend- ing on the address bus x, and these coefﬁcients exist over independent ranges. This is achieved with the architecture in ﬁgure 1, with implications in the function domain shown in ﬁgure 2. Instead of single hardwired coefﬁcients, a look up table containing many possible coefﬁcient values is used. These coefﬁcients are selected via an address bus input de- rived by a deﬁned masking of the main address bus; requir- ing zero hardware overhead in an FPGA implementation. Simple and uniform piecewise polynomial approximations are subsets of this technique, for certain maskings. The coefﬁcient values are calculated via linear program solution, which allows each coefﬁcient to be treated sepa- rately. For example, a set of linear equations representing polynomial evaluation x c 0 c 1 c n … address bus ŷ LUT LUT LUT … arbitrary bit mask mod 2 k Fig. 1. Architecture of the proposed technique. LUT with mask 000 ŷ x c 0 LUT with mask 110 LUT with mask 100 c 1 d 1 e 1 f 1 c 2 d 2 Fig. 2. Example showing implications of the proposed tech- nique in the function domain. the design in ﬁgure 2, having bit masks of 000, 110 and 100 for each coefﬁcient LUT respectively, has 4 coefﬁcients in the c 1 table and 2 in the c 2 table. As well as allowing different values for each coefﬁcient used in the polynomial evaluation, we make our architecture more general by applying a modulus of value 2 k to x before evaluating the polynomial. This modulus reduces the dy- namic range, and therefore precision required, by the poly- nomial coefﬁcients. To incorporate bi- and multi-partite table methods into our design space we consider k =0 as a special case. Nor- mally in this case the x input to the evaluation would be constant 0; our modiﬁcation is to take constant 1. The eval- uation then reduces to a summation of the coefﬁcient lookup table outputs. Bi- and multi- partite table methods are now a subset of the approach where the bit masks for each lookup table ﬁt the relevant deﬁnition. Bi- and multi-partite table Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’04) 0-7695-2230-0/04 $ 20.00 IEEE