Factorizable modulo M parallel architecture for DVB-S2 LDPC decoding Marco Gomes, Gabriel Falcão, Vitor Silva, Vitor Ferreira, Alexandre Sengo and Miguel Falcão* Instituto de Telecomunicações, Pólo II da Universidade de Coimbra, 3030-290 Coimbra, Portugal *Chipidea Microelectrónica S.A., Rua Frederico Ulrich, n. 2650, 4470-605 Moreira da Maia, Portugal e-mail: marco@co.it.pt, gff@co.it.pt, vitor@co.it.pt, vitorhugo@co.it.pt, sengo@co.it.pt, mfalcao@chipidea.com Abstract — State-of-the-art decoders for DVB-S2 low-density parity-check (LDPC) codes explore semi-parallel architectures based on the periodicity 360 M = factor of the special type of LDPC-IRA codes adopted. This paper addresses the generalization of a well known hardware M-kernel parallel structure and proposes an efficient partitioning by any factor of M, without addressing overhead and keeping unchanged the efficient message memory mapping scheme. Our method provides a simple and efficient way to reduce the decoder complexity. Synthesizing the decoder for an FPGA from Xilinx shows a minimum throughput above the minimal 90Mbps. I. INTRODUCTION The recent Digital Video Satellite Broadcast Standard (DVB-S2) [1] [2] has adopted a powerful FEC scheme based on the serial concatenation of BCH and Low Density Parity Check (LDPC) codes. This new FEC structure, combined with the adoption of high order modulations (QPSK, 8PSK, 16APSK and 32APSK), is able to provide capacity gains of about 30% over the previous DVB-S standard [2], with the LDPC codes playing a fundamental role in this raise of performance. LDPC codes are linear block codes defined by sparse parity-check matrices [3] [4] [5], H and, usually, represented by Tanner graphs [6]. A Tanner graph is a bi-partite graph formed by two types of nodes. Check nodes ( C ν ), one per each code constraint, and bit nodes one per each codeword bit (information and parity, respectively, I ν and P ν ), with the connection edges between them being given by H. They are decoded using low complexity iterative belief propagation algorithms operating over the Tanner graph description [7]. However, a major drawback is their high encoding complexity caused by the fact that the generator matrix, G, is, in general, not sparse. In order to overcome this problem, DVB-S2 standard has adopted a special class of LDPC codes, with linear encoding complexity, known by Irregular Repeat-Accumulate (IRA) [8] [9]. An important issue in the design of LDPC encoder and decoder architectures for DVB-S2 is the fact that the standard supports two different frame lengths (16200 bits for low delay applications and 64800 bits otherwise) and a set of different code rates ( 14 , 13 , 25 , 12 , 35 , 23 , 34 , 45 , 56 , 89 and 9 10 ) for both frame lengths and different modulation schemes [1] [9]. For each mode of operation is defined a different LDPC code and, although they share a similar structure and properties, this still poses an enormous challenge on the development of an encoder and a decoder fully compliant with all operating modes. The decoder state-of-the-art is based on a flexible partial parallel architecture that explores the 360 M = periodicity nature of DVB-S2 LDPC codes [10]. Although capable of providing a throughput far above from the minimum mandatory rate of 90 Mbps , this architecture requires a huge ASIC area of 2 22.74 mm on a ST Microelectronics 0.13 m μ technology, mainly due to the high number (360) of computation kernels or functional units (FU) and the wide length of the barrel shifter. In order to decrease the number of computation kernels to only 45 FU’s and to reduce the length of the barrel shifter, an alternative solution was proposed [11] which uses a re-structured version of H. As a consequence, this approach increases the complexity of the DVB-S2 de- interleaver and doubles (almost) the input memory in terms of [10]. In this paper we generalize the architecture [10] and surpass its disadvantages. We will show that it is possible to reduce the number of computation kernels to any integer factor of 360 M = , without addressing overhead and keep unchanged the efficient message memory mapping scheme [10]. Our strategy also reduces the length of the barrel shifter by the same factor and considerably simplifies the routing problem. The throughput is reduced by the same factor but this does not represent a real problem since the architecture [10] is able to provide a throughput far above from the mandatory minimum rate. Thus, we provide a simple and efficient method to reduce the decoder complexity without loosing the throughput goals. The next section briefly describes DVB-S2 LDPC-IRA codes. Section III addresses the LDPC decoding for DVB-S2 using a partial parallel architecture and its generalization by sub-sampling it by a factor of M. Synthesis results are presented in section IV and final conclusions are pointed out in section V. II. DVB-S2 LDPC-IRA CODES The new DVB-S2 [1] [9] standard adopted a special class of LDPC codes known by IRA codes [8] as the main solution for the FEC system. An IRA code is characterized by a parity check matrix, H , of the form,