A 630 Mbps Non-Binary LDPC Decoder for FPGA J. O. Lacruz Electrical Engineering Department Universidad de Los Andes. M´ erida, Venezuela E-mail: jlacruz@ula.ve F. Garc´ ıa-Herrero, M. J. Canet, J. Valls and A. P´ erez-Pascual Instituto de Telecomunicaciones y Aplicaciones Multimedia Universitat Polit` ecnica de Val` encia. Gandia, Spain E-mail: fragarh2@epsg.upv.es, {macasu, jvalls,asperez}@eln.upv.es Abstract—A high-speed non-binary LDPC decoder based on Trellis Min-Max algorithm with layered schedule is presented. The proposed approach compresses the check-node output mes- sages into a reduced set, decreasing the number of messages sent to the variable node. Additionally, the memory resources from the layered architecture are reduced. The proposed decoder was implemented for the (2304,2048) NB-LDPC code over GF(16) on a Virtex-7 FPGA and in a 90 nm CMOS process. Our implementation outperforms state-of-the-art NB-LDPC decoder implementations for both technologies, achieving a throughput of 630 and 965 Mbps, respectively. I. I NTRODUCTION Non-Binary Low-Density Parity-Check (NB-LDPC) codes emerge as alternative to their binary counterparts in scenarios where short/medium codeword length codes and better per- formance at high signal-to-noise ratios (SNR) are required. Additionally, they improve burst error correction capability, especially with high order Galois fields. On the other hand, the main drawbacks of NB-LDPC codes are: i) the high complexity of their check-node (CN); ii) the large amount of area spend on storage (RAM memories and registers); and iii) the routing congestion that limits the overall decoding throughput. NB-LDPC codes were first investigated by Davey and MacKay [1], as an extension of binary LDPC codes. Since then, great efforts have been made to reduce the complexity of the original Q-ary Sum-of-Product Algorithm (QSPA) [1]. Extended Min-Sum (EMS) [2] and Min-Max [3] algorithms were proposed as approximations of the QSPA [1], reducing considerably the CN complexity. However, EMS and Min-Max algorithms are unable to reach high throughput because of the use of forward-backward (FB) metrics on the CN processor. Recently, Trellis EMS (T-EMS) algorithm [4] [5] was pro- posed. It enables the parallel processing of messages at the CN and increases the throughput in comparison with decoders that use FB metrics. The main disadvantage of T-EMS algorithm is that the CN complexity is still high due to the parallel processing and, thus, it leads to a large area decoder. Simplified Trellis Min-Max (T-MM) algorithm [6] was proposed with the aim of reducing the CN complexity of T-EMS algorithm without compromising the decoding performance. Despite the advantages of T-MM compared with its predecessors, the area required is still high due to thelarge amount of storage elements, specially when layered schedule is applied. In this paper we propose a NB-LDPC decoder architecture for T-MM algorithm which requires many less memory ele- ments than the conventional implementation of this algorithm. The main idea is to minimize the messages exchanged between CN and VN processors. Thus, we remove any redundant information and only keep the minimum set of values required to reconstruct all the messages at the VN processor. The proposed decoder architecture is implemented on a Virtex-7 FPGA for a (2304,2048) NB-LDPC code over GF(16) [7]. It needs 83% less memory resources in comparison with a conventional implementation of T-MM algorithm [6] without introducing any performance loss. The throughput achieved is 630 Mbps, outperforming state-of-the-art NB-LDPC decoders implemented on FPGA devices [8] [9] [10]. The rest of the paper is organized as follows: Section II reviews the basis of T-MM algorithm, in Section III the check node and the top-level decoder architecture are derived and implementation results for FPGA and ASIC are presented. Finally, conclusions are outlined in Section IV. II. BASIS ON NB-LDPC CODES AND T-MM DECODING ALGORITHM NB-LDPC codes are linear block codes defined by a sparse parity-check matrix H with M rows and N columns, where each non-zero element h m,n belongs to Galois field GF (q = 2 p ). We consider regular NB-LDPC codes with constant row weight d c and column weight d v . Each row (column) of H is associated to a check node CN (variable node VN). Q m,n (a) and R m,n (a) denote the exchanged messages from VN to CN and from CN to VN for each symbol a GF (q), respectively. N (m) and M(n) denote the sets of non-zero elements per row and column in H, respectively. Trellis Min-Max (T-MM) algorithm [6] calculates the out- put CN reliabilities by organizing the ΔQ m,n (a) messages in a trellis and including an extra column ΔQ(a) which enables the parallel processing in the CN processor. ΔQ m,n (a) is the delta domain information defined as ΔQ m,n (a + z n )= Q m,n (a), where z n n ∈N (m) are the tentative hard-decision symbols. In order to represent the trellis in a CN, the reliability information is organized in a matrix with the GF symbols in its rows and the n ∈N (m) in its columns. Therefore, once the delta domain is applied, the most reliable symbols are located in the first row of the trellis, which is the hard- decision path. T-MM requires the computation of the two most reliable messages per row in ΔQ m,n (a). The most reliable values, m1(a), are used to compute the extra column values using (1). ΔQ(a)= min a conf (1,2) max (m1(a )) (1) 978-1-4799-8391-9/15/$31.00 ©2015 IEEE 1989