Bit-Level Pipelinable General and Fixed Coefficient Digit-Serial/Parallel Multipliers Based on Shift-Accumulation Oscar Gustafsson and Lars Wanhammar Department of Electrical Engineering Linköping University, SE-581 83 Linköping, SWEDEN E-mail: {oscarg, larsw}@isy.liu.se ABSTRACT In this work we introduce a novel approach to digit-serial/parallel multiplication. This general class of multipliers is based on shift- accumulation which also makes the approach suitable for imple- mentation of shift-accumulators in distributed arithmetic. As a var- iable in the design process, the maximal number of cascaded full- adders can be selected. Thus, it is possible to as a special case ob- tain a bit-level pipelined multiplier. Both general and fixed coeffi- cient multiplication is considered. The hardware complexity is low compared with other approaches. 1. INTRODUCTION Digit-serial processing techniques has received considerable atten- tion during the last decade [1][2]. In digit-serial processing a number of bits of the input word, a digit, is processed in parallel. If the digit-size, i.e., the number of bits processed concurrently, is one the digit-serial system reduces to a bit-serial system, while for a digit-size equal to the word length the system reduces to a bit-par- allel system. The motivation for digit-serial processing is to find an optimum trade-off between the low area of bit-serial processing and the high processing power of bit-parallel processing. As low power consumption also has been a key interest in recent years, this is also a figure of merit to optimize for. Traditionally, digit-serial multipliers has been obtained either via unfolding of a bit-serial multiplier [3] or via folding of a bit-parallel multiplier [4]. The problem with these approaches is that the ob- tained circuits have not been pipelinable at the bit-level. The reason for this is shown in Fig. 1 where a digit-serial adder obtained via unfolding is shown. The recursive loop prohibits the insertion of pipelining to reduce the critical path to less than d full-adders. So- lutions to this problem has been proposed in a number of papers [5]-[7]. These solutions and our proposed solution are based on a redundant intermediate representation and a pipelined digit-serial adder at the output. In this work we will present a novel method for obtaining digit- serial/parallel multipliers with arbitrary short critical path. The method is based on shift-accumulation and is thus also suitable for designing shift-accumulators used in distributed arithmetic. Both general and fixed coefficient multipliers will be considered. 2. PROPOSED STRUCTURES Starting from the bit-serial multiplier in Fig. 2 we will derive the principles of a digit-serial multiplier and introduce our proposed multiplier structure. We will start with unsigned multiplication and extend to signed multiplication. Each clock cycle the multiplier in Fig. 2 inputs a bit from the input word and the partial products is formed in the AND-gates. These partial products is accumulated in the adders and shifted at the next clock cycle. Considering a dot representation of the partial products of the complete multiplication each clock period corresponds to a row in the partial product matrix [8]. The multiplier can be unfolded d times to obtain a digit-serial im- plementation with digit-size d. Each clock period d bits of the input word is applied to the input and d rows of the partial product matrix is added to the previous product stored in the registers. The differ- ent partial products in the operation is shown in Fig. 3. The top rec- tangle corresponds the partial products forming the intermediate sum in a carry-save representation. The parallelogram corresponds to the partial products of the current input digit. The main idea in this work is that instead of reducing the partial products to a maximum of two values as shown in Fig. 3, we allow an arbitrary number of registers. Thus, the registers be used both for pipelining and storing the intermediate values in the shift-accumu- lation process. By limiting the height of the reduction tree in the ac- cumulator to h levels, the critical path will contain at most h full- adders. The partial products in this generalized shift-accumulation is shown in Fig. 4. The corresponding implementation is shown in Fig. 5. Note that the level of full-adders h is equal to the pipeline level. Therefore, selecting h = 1 leads to a bit-level pipelined imple- mentation.                       Fig. 1. Digit-serial adder obtained from unfolding.                    Fig. 2. Bit-serial/parallel multiplier.        Fig. 3. Partial products in the unfolded digit-serial/parallel multiplier.