RECONFIGURABLE DECODER ARCHITECTURES FOR RAPTOR CODES Hady Zeineddine and Mohammad M. Mansour ECE Department American University of Beirut Beirut, Lebanon Email: {hma41,mmansour}@aub.edu.lb ABSTRACT Decoder architectures for architecture-aware Raptor codes having regular message access-and-processing patterns are presented. Rap- tor codes are a class of concatenated codes composed of a fixed- rate precode and a Luby-Transform (LT) code that can be used as rate-less error-correcting codes over communication channels. In the proposed approach, the decoding procedure is mapped to row processing of a regular matrix, which adapts effectively to the code’s randomness and degree-irregularity. This is achieved by 1) developing reconfigurable check node processors that attain a constant throughput while processing LT- and LDPC-nodes of varying degrees and numbers, 2) applying pseudo-random permu- tation on the communicated messages, and 3) computing bit-to- check messages in a serial, temporally distributed manner. A serial decoder for a rate-0.4 code implementing the proposed approach was synthesized in 65nm CMOS technology. Hardware simulations show that the decoder achieves a throughput of 22M b/s at BER of 10 6 , dissipates an average power of 222mW and occupies an area of 1.77mm 2 . A range of partially-parallel decoders with desired throughput can be designed by replicating the processing nodes of a serial decoder. I. INTRODUCTION A Raptor code is constructed by concatenating a fixed-rate precode to a rateless LT code [1]. Raptor codes were originally designed to operate on erasure channels, and later extended for correcting errors over other communication channels [2]. The rate of a Raptor code is determined on a block-by-block basis or even changed for the same block, upon a decoding failure, thus, making it advantageous to utilize over binary-input memoryless symmetric channels. LT codes can be decoded efficiently using Gallager’s iterative two-phase message-passing algorithm (TPMP), typically used in LDPC decoding [3]. In the case of an LDPC precode, the TPMP algorithm can be applied on the LT-LDPC concatenated code instead of applying a two-stage decoding. Joint decoding achieves better coding performance and results in faster convergence, and enables utilizing the same hardware resources for LT and LDPC decoding. This motivates the need for a hardware-efficient decoder architecture for Raptor codes, having an LDPC code as a precode. The peculiar features of Raptor codes impose serious challenges on applying the optimizations targeted at hardware-efficient LDPC decoders to Raptor decoders (e.g. [4]–[7]). These features include: variable code rate, random LT-encoding, variable check-degree This work was supported by funds from the University Research Board at the American University of Beirut. distribution, and joint decoding of the LT code and LDPC precode. These irregularity and randomness features lead to low resource utilization, high control overhead, complex data movement patterns, in addition to stringent memory requirements, thus resulting in a highly inefficient implementation. In [8], a method to construct architecture-aware (AA)-Raptor codes was proposed. This method embeds compatible structure into both LT and LDPC codes and decouples code structuring from random LT encoding. In this paper, a decoder architecture for this class of AA-Raptor codes is presented. The proposed approach is to make use of the code structure and the architectural optimizations, to map the decoding procedure into row processing of a regular matrix. The decoding schedule hence is made simple, regular and identical across both LT and LDPC codes. To this end, the bit-to-check message computation is temporally distributed so that varying the rate or bit-node degrees changes the number of cycles per decoding iteration, while leaving the workload per cycle unchanged. To solve the check-degree variability problem, a novel reconfigurable check-function unit (CFU), with a constant throughput, is designed to process LT-nodes whose degrees sum to a constant p, and LDPC nodes whose degrees are a multiple of p. The remainder of the paper is organized as follows. Section II presents the decoding scheduling and the corresponding serial architecture. The reconfigurable check-node unit design is described in Section III, and section IV gives hardware simulation results for the serial decoder implementation. II. DECODER ARCHITECTURE The decoding scheduling, and consequently the architecture, is based on the following three features of the Raptor codes con- structed in [8]. Throughput the paper, p is assumed to be prime. Source matrix The LT code is derived from a p × (p 1) matrix H0 =[h ij ] of p×p shifted identity matrices. Each nonzero element of h ij is in turn a p × p shifted identity matrix. Row-Splitting Every row in H0, having weight p 1, is split into several rows. The formed rows are appended to the LT matrix. LDPC Matrix Each row in the p 2 × p 2 (p 1) LDPC matrix Hp has weight/check-degree c(p 1) and is attained by merging (or equivalently xoring) c rows of H0. II-A. Serial Decoder Architecture The serial decoder processes messages corresponding to one row of H0 per cycle. Figure 1 illustrates the decoder architecture. For clarity of exposition, let H R be a submatrix of H0 composed of M rows used to generate the LT graph, concatenated with the cp 2 rows 1669 978-1-4577-0539-7/11/$26.00 ©2011 IEEE ICASSP 2011