IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, , VOL. 6, NO. 1, JANUARY 2007 1 Measuring Improvement when Using HUB Formats to Implement Floating-Point Systems under Round-to-Nearest Javier Hormigo, and Julio Villalba, Member, IEEE Abstract—This paper analyzes the benefits of using HUB formats to implement floating-point arithmetic under round-to- nearest mode from a quantitative point of view. Using HUB formats to represent numbers allows the removal of the rounding logic of arithmetic units, including sticky-bit computation. This is shown for floating-point adders, multipliers, and converters. Experimental analysis demonstrates that HUB formats and the corresponding arithmetic units maintain the same accuracy as conventional ones. On the other hand, the implementation of these units, based on basic architectures, shows that HUB formats simultaneously improve area, speed, and power consumption. Specifically, based on data obtained from the synthesis, a HUB single-precision adder is about 14% faster but consumes 38% less area and 26% less power than the conventional adder. Similarly, a HUB single-precision multiplier is 17% faster, uses 22% less area, and consumes slightly less power than conventional multiplier. At the same speed, the adder and multiplier achieve area and power reductions of up to 50% and 40%, respectively. Index Terms—floating-point-arithmetic, digital-arithmetic, op- timization, power-consumption, adders, multiplication I. I NTRODUCTION T HE rounding operation is performed in almost all arith- metic operations involving real numbers. There are several ways to perform this operation, although unbiased rounding-to-nearest has the best characteristics [1][2]. It pro- vides the closest possible number to the original exact value, but if the exact value is exactly halfway between two num- bers, then it is selected randomly. The most commonly used approach is the tie-to-even method, which is the default mode of the floating-point IEEE-754 standard (see [3]). However, the implementation of this rounding mode is relatively complex, and the area and delay introduced for rounding circuits may be very large, since they normally lead in the critical path. For this reason, it is only generally used in floating-point (FP) circuits. Many researchers have proposed different architectures to reduce the impact of this delay by merging rounding with other operations or removing it from the critical path. For instance, an FP adder was proposed in [4], such that if the result of an addition is input to another one, the incrementation required for rounding up is postponed until the next operation. In [5], a dedicated circuit to compute the sticky bit in parallel with the main path was proposed with the aim of accelerating the implementation of multiplication. The authors are members of the Department of Computer Architecture, Universidad de M´ alaga, M´ alaga E-29071 Spain(e-mail:fjhormigo@uma.es). This work was supported in part by the Ministry of Education and Science of Spain under contracts TIN2013-42253-P. A compound adder (a circuit which, having a carry-save input, deliverers the results and the result plus one) was proposed in [6] to generate the rounded result of any operation. In [7], three different methods were compared for multipliers which simplify rounding decisions and merge the rounding up with the computation of the operation. Similarly, [8] proposed combining rounding with the final addition to convert the carry-save solution to conventional representation. In [9], a rounding scheme was presented for high-speed multipliers based on a rounding table and prediction. A totally different approach would be to use a new real- number encoding, in order to simplify the implementation of round-to-nearest. Thus, the problem would change from optimizing the rounding operation to dealing with arithmetic operations under the new number representation. This pro- posal is found in [10] with Round-to-Nearest representations (RN-representations) and [11] with Half-Unit Biased (HUB) formats. Together with other advantages, these new formats allow performing round-to-nearest simply by truncation. On the other hand, these new formats are based on simple mod- ifications of conventional formats and so could be applied to practically any conventional format. In this article, we focus on HUB FP formats. The efficiency of using HUB formats for fixed-point repre- sentation has been demonstrated in [12] and [13]. By reducing bit-width while maintaining the same accuracy, the area cost and delay of FIR filter implementations has been dramatically reduced in [12], and similarly for the QR decomposition in [13]. In this article, we perform a quantitative estimation of the benefit obtained using HUB formats to implement FP computation systems under round-to-nearest. Some prelimi- nary results for half-precision FP adders and multipliers were presented in [14]. This previous work shows that the area and power consumption of a basic FP adder could be improved by up to 70% for high frequencies when using HUB formats, whereas they remain the same for the basic FP multiplier. In addition to a deeper analysis, in this article we extend these results to other sizes and circuits, such as converters. In comparison to previous work, the main contributions of this article are: A detailed architecture for basic adder and multiplier to deal with HUB numbers A study of the conversions between different FP formats and the corresponding architectures The experimental comparison of accuracy between HUB This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TVLSI.2015.2502318 Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.