PIPELINED PARALLEL ARCHITECTURE FOR HIGH THROUGHPUT MAP DETECTORS Ruwan Ratnayake, Gu-Yeon Wei and Aleksandar Kavˇ ci´ c Division of Engineering and Applied Sciences Harvard University, MA 02138, USA ABSTRACT A maximum a posteriori probability (MAP) detector based on a forward only algorithm with high throughput is con- sidered in this paper. MAP gives the optimal performance and, with Turbo decoding, can achieve performance close to the channel capacity limits. Deep pipelined architecture for the forward only method is presented and compared with the other throughput-increasing methods. Simulation re- sults based on the iterative MAP-LDPC (low-density par- ity check) system are shown. Hardware implementation is- sues that exploit the regularities of the structure are also dis- cussed. 1. INTRODUCTION High-speed detectors are of interest in research as well as in industry, particularly in magnetic recording where speeds on the order of 1Gps are needed. Naturally, proposed meth- ods that perform in the Gbps range use computationally less intensive algorithms such as the Viterbi detector, which gen- erate hard outputs [1]. Even though these detectors reach the 1Gbps milestone in throughput, their inherent inability to generate soft outputs make them less attractive for use in iterative systems. Thus, algorithms that give soft out- puts such as the soft output Viterbi algorithm (SOVA) are attractive since they can exploit iterative detection for better performance [2]. However, these are still suboptimal algo- rithms in terms of bit error rate (BER) performance. Up to now, maximum a posteriori (MAP) algorithms that give op- timal performance have not been considered for high-speed detectors due to their computational complexity. MAP de- tectors have so far only targeted wireless communication systems where data throughput requirements are much lower. The MAP algorithm by Bahl, Cocke, Jelinek and Raviv (BCJR), requires forward and backward computations (FB- BCJR) [3]. This is in contrast to the Viterbi or SOVA algo- rithms, which allow the computations to be performed only in the forward direction. Once the input stream is fed into Viterbi/SOVA detectors, the output is generated after a fixed delay and retain the same order [4]. However, the a posteri- ori probability (APP) output of the FB-BCJR algorithm can only be evaluated after both forward and backward metrics are computed. Inevitably, the outgoing symbols appear in a permuted order relative to the incoming symbols. There is a scheme that performs MAP with computa- tions only in the forward direction [5]. We call this algo- rithm forward-only BCJR (FOBCJR). The data flow of this algorithm is similar to the Viterbi algorithm, where soft out- puts are computed after a fixed delay relative to the incom- ing symbols, resulting in ordered outputs. Similar to the Viterbi algorithm, FOBCJR keeps track of soft survivors, which are kept in a fixed-length sliding-window survivor memory. A prominent feature of FOBCJR is its parallel structure, where as FB-BCJR only allow sequential state metric computations. Parallelism facilitates pipelining, re- sulting in an increase in throughput. In terms of throughput and input to output delay, FOBCJR with a deep pipelined structure is superior to other methods for computing APPs. This paper begins with an overview of the FOBCJR al- gorithm. Afterwards in Section 3, we introduce three pos- sible schemes to improve the throughput of the FOBCJR algorithm. By making quantitative comparisons we show that one of these methods, which is based on deep pipelined computations, outperforms the other two in terms of through- put and require less hardware. Section 4 presents the sim- ulation results for these methods. Section 5 discusses some implementation methods that can exploit the inherent prop- erties of the FOBCJR algorithm. Finally, Section 6 con- cludes the paper. 2. FORWARD ONLY BCJR As implied by the name, FOBCJR computes state metrics only in the forward direction. It performs 3 basic tasks, namely extend, update and collect. This is similar to the Viterbi algorithm which performs extend, update and select. Extend and update are recursive operations. The extend op- eration extends the state buffer by adding one new column of state metrics based on the current received sample and previous state metrics. The collect operation extracts the APPs at the other end of the state buffer. The remaining state metrics of the buffer are updated with the update op- eration based on information on the same received sample. A key feature of FOBCJR is that the all update operations II - 505 0-7803-8251-X/04/$17.00 ©2004 IEEE ISCAS 2004