A 630 Mbps Non-Binary LDPC Decoder for FPGA
J. O. Lacruz
Electrical Engineering Department
Universidad de Los Andes. M´ erida, Venezuela
E-mail: jlacruz@ula.ve
F. Garc´ ıa-Herrero, M. J. Canet, J. Valls and A. P´ erez-Pascual
Instituto de Telecomunicaciones y Aplicaciones Multimedia
Universitat Polit` ecnica de Val` encia. Gandia, Spain
E-mail: fragarh2@epsg.upv.es, {macasu, jvalls,asperez}@eln.upv.es
Abstract—A high-speed non-binary LDPC decoder based on
Trellis Min-Max algorithm with layered schedule is presented.
The proposed approach compresses the check-node output mes-
sages into a reduced set, decreasing the number of messages sent
to the variable node. Additionally, the memory resources from
the layered architecture are reduced. The proposed decoder was
implemented for the (2304,2048) NB-LDPC code over GF(16)
on a Virtex-7 FPGA and in a 90 nm CMOS process. Our
implementation outperforms state-of-the-art NB-LDPC decoder
implementations for both technologies, achieving a throughput of
630 and 965 Mbps, respectively.
I. I NTRODUCTION
Non-Binary Low-Density Parity-Check (NB-LDPC) codes
emerge as alternative to their binary counterparts in scenarios
where short/medium codeword length codes and better per-
formance at high signal-to-noise ratios (SNR) are required.
Additionally, they improve burst error correction capability,
especially with high order Galois fields. On the other hand,
the main drawbacks of NB-LDPC codes are: i) the high
complexity of their check-node (CN); ii) the large amount of
area spend on storage (RAM memories and registers); and
iii) the routing congestion that limits the overall decoding
throughput.
NB-LDPC codes were first investigated by Davey and
MacKay [1], as an extension of binary LDPC codes. Since
then, great efforts have been made to reduce the complexity
of the original Q-ary Sum-of-Product Algorithm (QSPA) [1].
Extended Min-Sum (EMS) [2] and Min-Max [3] algorithms
were proposed as approximations of the QSPA [1], reducing
considerably the CN complexity. However, EMS and Min-Max
algorithms are unable to reach high throughput because of the
use of forward-backward (FB) metrics on the CN processor.
Recently, Trellis EMS (T-EMS) algorithm [4] [5] was pro-
posed. It enables the parallel processing of messages at the CN
and increases the throughput in comparison with decoders that
use FB metrics. The main disadvantage of T-EMS algorithm
is that the CN complexity is still high due to the parallel
processing and, thus, it leads to a large area decoder. Simplified
Trellis Min-Max (T-MM) algorithm [6] was proposed with
the aim of reducing the CN complexity of T-EMS algorithm
without compromising the decoding performance. Despite the
advantages of T-MM compared with its predecessors, the
area required is still high due to thelarge amount of storage
elements, specially when layered schedule is applied.
In this paper we propose a NB-LDPC decoder architecture
for T-MM algorithm which requires many less memory ele-
ments than the conventional implementation of this algorithm.
The main idea is to minimize the messages exchanged between
CN and VN processors. Thus, we remove any redundant
information and only keep the minimum set of values required
to reconstruct all the messages at the VN processor. The
proposed decoder architecture is implemented on a Virtex-7
FPGA for a (2304,2048) NB-LDPC code over GF(16) [7].
It needs 83% less memory resources in comparison with a
conventional implementation of T-MM algorithm [6] without
introducing any performance loss. The throughput achieved is
630 Mbps, outperforming state-of-the-art NB-LDPC decoders
implemented on FPGA devices [8] [9] [10].
The rest of the paper is organized as follows: Section II
reviews the basis of T-MM algorithm, in Section III the check
node and the top-level decoder architecture are derived and
implementation results for FPGA and ASIC are presented.
Finally, conclusions are outlined in Section IV.
II. BASIS ON NB-LDPC CODES AND T-MM DECODING
ALGORITHM
NB-LDPC codes are linear block codes defined by a sparse
parity-check matrix H with M rows and N columns, where
each non-zero element h
m,n
belongs to Galois field GF (q =
2
p
). We consider regular NB-LDPC codes with constant row
weight d
c
and column weight d
v
. Each row (column) of H is
associated to a check node CN (variable node VN). Q
m,n
(a)
and R
m,n
(a) denote the exchanged messages from VN to CN
and from CN to VN for each symbol a ∈ GF (q), respectively.
N (m) and M(n) denote the sets of non-zero elements per row
and column in H, respectively.
Trellis Min-Max (T-MM) algorithm [6] calculates the out-
put CN reliabilities by organizing the ΔQ
m,n
(a) messages in a
trellis and including an extra column ΔQ(a) which enables the
parallel processing in the CN processor. ΔQ
m,n
(a) is the delta
domain information defined as ΔQ
m,n
(a + z
n
)= Q
m,n
(a),
where z
n
∀ n ∈N (m) are the tentative hard-decision symbols.
In order to represent the trellis in a CN, the reliability
information is organized in a matrix with the GF symbols
in its rows and the n ∈N (m) in its columns. Therefore,
once the delta domain is applied, the most reliable symbols
are located in the first row of the trellis, which is the hard-
decision path. T-MM requires the computation of the two most
reliable messages per row in ΔQ
m,n
(a). The most reliable
values, m1(a), are used to compute the extra column values
using (1).
ΔQ(a)= min
a
′
∈conf
∗
(1,2)
max (m1(a
′
))
(1)
978-1-4799-8391-9/15/$31.00 ©2015 IEEE 1989