Abstract—Multiplication is an important fundamental operation in most signal and image processing applications. High definition image processing has put a huge demand on fast and massive data processing and shrinking the CMOS process made the silicon real estate available to provide for such massive data processing building blocks. We compare large width multipliers from an architecture point of view, maximum clock frequency, latency, throughput, resource usage, power consumption. We use a flopped combinational baseline multiplier for our comparison and we use the same FPGA platform to be fair in our analysis. We mention some remarks and conclude that shift and add is the best. Index Terms—Computer arithmetic, FPGA, low power, multiplier, verilog. I. INTRODUCTION With the increased reliance on mobile devices in our lives and with this trend growing with time, the importance of battery life becomes more and more critical. Also with the reduction of feature sizes in CMOS technology more silicon real estate is available to add functionality or improve current services. These factors mandate a detailed study of the different design tradeoffs that a design engineer would consider while trying to achieve specific product goals. Usually a design engineer has the freedom to choose the details of the design that serves the functionalty required at the target frequency. There is the typical tradeoff between area, speed and power consumption. Customers always expect more services and functionality of their mobile devices. With more transistors being squeezed in the same area the more functionalty we can add to the product, this limits how much real estate we can spend on improving current designs and services. This cramming of transitors and adding new features puts very high emphasis on low power consumption for each and every design component to prolong operating time, especially with the increase of leakage as technologies shrink. Power consumption is composed of three components; static, dynamic and leakage power consumptions as shown in (1) below. P Consumption = P Leakage +P Static +P Dynamic (1) CMOS processes do not exhibit static power consumption. Current technology processes suffer from leakage and this power component will continue to grow as the process shrinks. The dynamic power consumption is dependent on the switching events at every node of the Manuscript received September 10, 2012; revised November 29, 2012. The authors are with the Varkon Semiconductos (email: asayed@varkonsemi.com; mohamed.aly@varkonsemi.com). circuit. One part of it is short circuit current while transistors switching and the other is the charging and discharging of load capacitances. Good choice of gate sizes would reduce the short circuit component dramatically as a result of sharp transitions of signals across the circuit. The charging and discharging of capacitance part is shown below in (2). P Dynamic = ∑ α C load V 2 F (2) Equation 2 states that any node that switches within the circuit will consume power directly proportional to the capacitance charged the supply voltage and that node’s swing voltage. This said, how often this node’s load charges or discharges identifies its share of the total dynamic power consumption of the circuit and this is represented in the activity factor α*F which represents the toggle rate. Power analysis can be either done staticaly or dynamicaly. That is to say, either with establishing the toggle rate of each node staticaly based on known inputs toggle rates, or using explicit simulations to run the circuit and capture the internal switching dynamicaly for a specific simulation. Static power analyses are usually used for estimating average power. Dynamic power analysis can be used to measure the average power over a specific period and for a specific test case(s). Having insights in specific test cases that are of special interest for example one usage mode that dominates the duration of operation; would result in better estimates. Dynamic power analyses are also used for peak power estimation when there is high confidence that a specific test case has proven to consume maximum power, this is very helpful in power grid design. In the wireless mobile communications area, especially after mobile video being on the rise, average power consumption is of significant importance. It is advertised as the battery operating time expectancy for different mobile devices, hence we focus on average power in our study. Multiplication is inherent in the hardware implementatation of any algorithm, be it in the signal, image processing or communications arenas, therefore details and optimizations of different multiplier architectures are of prime importance in the wireless communications field. FPGAs have proven to be the fastest prototyping platform for any integrated circuits application as shown in [1], [2]. Power consumption estimations are very much more accurate in FPGAs than in front end ASICs. Therefore, an FPGA platform has been chosen for our study. The paper is organized as follows; in Section II we go over the different architectures considered in our study. Section III covers the comparison of implementation results with respect to specific metrics. Section IV covers our observations, remarks and recommendations; and we wrap Ahmed Sayed and Mohamed Aly A Study of Large Width Unsigned Multipliers on FPGAs 44 DOI: 10.7763/IJCEE.2013.V5.659 International Journal of Computer and Electrical Engineering, Vol. 5, No. 1, February 2013