1 HUB-Floating-Point for improving FPGA implementations of DSP Applications Javier Hormigo, and Julio Villalba, Member, IEEE Abstract—The increasing complexity of new digital signal- processing applications is forcing the use of ﬂoating-point num- bers in their hardware implementations. In this brief, we inves- tigate the advantages of using HUB formats to implement these ﬂoating-point applications on FPGAs. These new ﬂoating-point formats allow for the effective elimination of the rounding logic on ﬂoating-point arithmetic units. Firstly, we experimentally show that HUB and standard formats provide equivalent SNR on DSP application implementations. We then present a detailed study of the improvement achieved when implementing ﬂoating-point adders and multipliers on FPGAs by using HUB numbers. In most of the cases studied, the HUB approach reduces resource use and increases the speed of these FP units, while always providing statistically equivalent accuracy as that of conventional formats. However, for some speciﬁc sizes, HUB multipliers require far more resources than the corresponding conventional approach. Index Terms—FPGA, ﬂoating-point, DSP applications, HUB- format I. I NTRODUCTION N OWADAYS, many Digital Signal Processing (DSP) ap- plications, such as graphics, wireless communications, industrial control, and medical imaging require the use of lin- ear algebra or other complex algorithms. The use of Floating- Point (FP) arithmetic is quickly becoming a requirement in these applications due to its extended dynamic range and precision. For this reason, FP arithmetic is being introduced on FPGA implementations, as a soft-core [1] [2], or even as a hardware block in the newest Altera devices [3]. Although these embedded hardware blocks are more efﬁcient and cost effective than their equivalent soft-core designs, the latter are still very useful. Firstly, low-cost devices do not offer these FP embedded blocks and it is not clear that other FPGA brands are going to include something similar in their devices in the near future. Secondly, up to now, only single precision has been directly supported in DSP blocks [4]. Therefore, improvements to the soft-core implementations are of great value. Some of these solutions are being designed to follow the IEEE754 standard [5]. However, in many applications, compliance with this standard is sacriﬁced to obtain more efﬁcient implementations regarding area and performance. In relation to FPGAs, much more efﬁcient designs are obtained by using more ﬂexible implementations of FP numbers and ensuring the fulﬁllment of certain quality parameters at the This work was supported in part by the Ministry of Education and Science of Spain under contracts TIN2013-42253-P. The authors are with the Department of Computer Architecture, Uni- versidad de M´ alaga, M´ alaga E-29071 Spain (e-mail: fjhormigo@uma.es; jvillaba@uma.es). output. These ﬂexible implementations could utilize word- length optimization [6] [7], high-radix representation [8], and fused datapath synthesis [9], or avoid the implementation of unnecessary rounding modes [2], exceptions, or subnormals support [1]. Generally, both hard and soft cores only support the round- to-nearest-even (RNE) mode, since this is the most useful of the rounding modes. In these FP cores, a signiﬁcant amount of resource use and delay is due to the rounding logic. However, two new families of formats, HUB (Half-Unit biased) [10] and Round-to-Nearest [11] representations, allow RNE to be performed simply by truncation, which could make rounding logic negligible. Here, we focus on HUB formats. HUB Fixed-point formats were used in [12] and [13] to improve DSP implementations, since they allow better word-length optimization. The ASIC implementation of HUB-FP units has been studied for binary16 (half), binary32 (single), and binary64 (double) [5], and important improvements have been achieved [14] [15]. In this brief communication, we extend this analysis to FPGAs over a wide range of sizes. Compared to previous articles, we provide: • An experimental error analysis of the implementation of FIR ﬁlters, which shows that the HUB approach provides similar statistical parameters to those of standard FP implementations, including the SNR. • The results of FPGA implementation of a basic FP adder and multiplier for a wide range of exponent and mantissa bit-widths under HUB and conventional approaches and their comparison. In most of the cases studied, the HUB format reduces resource use and increases the speed of these FP units. Furthermore, due to its simplicity, any existing soft or hard core could be easily enhanced by using the proposed approach. Therefore, based on basic architectures, our aim is to encour- age researchers to improve their optimized FP cores or DSP applications by using HUB-FP formats. II. HUB-FP NUMBERS AND ASSOCIATED CIRCUITS Firstly, we summarize the main characteristics of the HUB- FP formats and circuits presented in [10] [15]. For demonstra- tions or further explanations, please refer to these papers. A HUB-FP number is an FP number such that its mantissa (or signiﬁcand) has an Implicit Least Signiﬁcant Bit (ILSB) which equals one. Compared with a standard format, it has the same number of explicit bits and precision, but the same bit-vector represents a value biased half Unit-in-the-Last- Place (ulp) [10]. For example, using an m-bit HUB number