International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-6, August 2012 27 Abstract—The addition operations can be optimized through a special purpose carry propagation logic in most of the FPGAs. The delay is same for small size operands and this redundant adders require more hardware resources than carry propagate adders. Therefore, carry-save adders are not usually implemented on FPGA devices, although they are very useful in ASIC implementations. In this paper we have showed that it is possible to implement redundant adders with a hardware cost close to that of a carry propagate adder. Redundant adders are clearly faster for 16 bits and bigger word lengths and have an area requirement similar to carry propagate adders. Among all the redundant adders studied, the 4:2 compressor is the fastest one, presents the best exploitation of the logic resources within FPGA slices and the easiest way to adapt classical algorithms to efficiently fit FPGA resources. This design aimed to be implemented in Spartan-3E FPGA. The CSA architecture uses 1215 LUT’s out of available 3840 and 96 IO blocks and the average fan-out of non clock nets is 4.73 and the peak memory usage is 148 MB. Index Terms—ASIC, redundant adders, FPGA. I. INTRODUCTION In despite of specific purpose ASIC designs always show a better performance regarding to area and time than FPGA designs, the use of FPGA devices has extended in the hardware design community in the last years due to its use facility, flexibility, liability, low cost and short development time. An FPGA device has a special inner structure that tries to cover the most general case of designs. Basically, an FPGA is structured as a grid of small elements which are able to implement basic logic operations and store resources, together with routes for interconnecting these elements . Besides, some other hardware resources have been recently added in order to accelerate some specific operations. However, most current hardware-oriented algorithms are intended to be implemented on ASIC-based chips, thus they do not take into account FPGAs especial configuration. Because of this, a big amount of hardware resources from within FPGAs are wasted in many designs. Due to this fact, recently there is an increasing interest of the scientific community to design new algorithms which take advantage of the special FPGA inner architecture [1]. The most usual operations in any design is addition. The architecture of most of the modern FPGA devices use to have a special hardware in charge of dealing with addition, which is mainly focused on improving the performance of carry propagate adders (CPA). More specifically, the path for carry propagation has been specially optimized so that it goes from one basic element to the next one using a specific fast route, together with some specific carry-logic to add and propagate the carry value. Because of this reason, carry propagated adders are preferred than carry-save adders (CSA) for implementation on FPGA devices, since, for non very long Manuscript received on August, 2012. S. Ravi Chandra Kishore, M.Tech ECE Department,JNTU Kakinada University/ Pydah College of Engineering and Tehnology/Visakhapatnam, India. K.V. Ramana Rao, Assoc.Professor & Head, Dept. of ECE, Pydah College of Engineering & Technology, India. word lengths, when compared with CPAs, CSAs have similar delays, but double the number of logic resources [2]. Nevertheless, we think that this is due to the fact that the software tool does not efficiently manage the system resources when mapping the carry-save adders into an FPGA Platform. In this paper we prove that there is possibility to implement carry-save adders on FPGA devices with a similar hardware cost to that of carry-propagate adders, while keeping a constant computation time, in such a way that considering operands with number of bits greater or equal to 16, the speed gain is notorious, this process is similar to an ASIC-based design. II. CARRY SAVE ADDERS ON FPGA This paper focuses mainly on the inner architecture of FPGAs with specialized carry-logic like Virtex 2, 4and Spartan 2, 3 of Xilinx and 4-input Look up tables. In spite of new generation Field programmable gate arrays which are having new inner architecture,FPGAs with four-input LUTs are widely used for medium complex applications due to low cost and low power consumption. Fig. 1 describes an architecture of a slice implementing a CPA. Each slice includes two four-input Look up tables, two flip-flops, the specialized carry-logic and the necessary logic and multiplexers. These elements are connected as shown in the figure to operate like a CPA: the lower slice generates a carry bit ( ci+1) and a sum bit ( si) from three input bits xi, , ci. By using the carry propagation logic the carry bit ci+1 is then passed to the upper slice, where it will be added with xi+1 and yi+1, generating the next sum and carry bits,si+1 and ci+2. Thus, each slice allocates the full addition of two pairs ofbits. If we use a carry-save adder, si and ci+1 should be computed in parallel for all bits comprising the input operands, independently from input and output carries. But this is not possible between the lower and upper parts of the slice, as we can see in Fig. 1. This means that hardware design tools allocate two Look up tables one for sum computation and carry computation.when they are provided with a CSA HDL description, i. e., they assign a full slice to the whole computation of one pair of bits. In carry save addition (CSA) implementation on FPGA, the carry-out bit and the sum bit are generated using two LUTs whereas a carry propagate addition (CPA) we need only one LUT. Thus, the hardware required for a Carry save adder is double than that for a CPA. Besides, the CSA implementation does not take advantage of the carry propagation logic. In an attempt to use the available carry-logic while keeping an adder maximum delay bounded regardless of the wordlength, authors from [1] present a solution making use of a high radix carry-save representation. Due to this high radix representation, initially introduced to reduce the number of wires and registers required to store a value, the sum word from a carry-save number is represented in radix- r (i. e. log2r bits per digit) and the carry word requires one only bit per radix- r digit, as shown in [6]. This representation allows the use of standard CPAs to add each of the sum word radix- r digit, Implementation of carry-save adders in FPGA S. Ravi Chandra Kishore, K.V. Ramana Rao