494 IEEETRANSACTIONSONVERYLARGESCALEINTEGRATION(VLSI)SYSTEMS,VOL.13,NO.4,APRIL2005 Memory Sub-Banking Scheme for High Throughput MAP-Based SISO Decoders MayankTiwari,YumingZhu,andChaitaliChakrabarti Abstract—The sliding window (SW) approach has been proposed as an effective means of reducing the memory requirements as well as the de- coding latency of the maximum a posteriori (MAP) based soft-input soft- output (SISO) decoder in a Turbo decoder. In this paper, we present sub- banked memory implementations (both single port and dual port) of the SWSISOdecoderthatachieveshighthroughput,lowdecodinglatency,and reduced memory energy consumption. Our contributions include deriva- tion of the optimal memory sub-banked structure for different SW con- figurations,studyoftherelationshipbetweenmemorysizeandenergycon- sumptionfordifferentSWconfigurationsandstudyoftheeffectofnumber of sub-banks on the throughput/decoding latency for a given SW configu- ration. Index Terms—High throughput, memory sub-banking, sliding window (SW), tradeoffs, Turbo decoder. I. INTRODUCTION In recent years, Turbo codes have become very popular because of their near-optimal performance [1], and have been adopted in mobile standards such as 3GPP for IMT-2000 and wideband code division multipleaccess(WCDMA).Thesuperiorperformanceisduetoacom- bination of parallel concatenated coding, iterative decoding, large in- terleaver size, etc. The large frame size of Turbo codes and the itera- tive decoding process results in large decoding latency. The decoding latency has to be reduced in order to make Turbo-based systems ac- ceptableforreal-timevoicecommunicationandotherapplicationsthat requireinstantdataprocessing,likeharddiskstorageandopticaltrans- mission. The Turbo decoder consists of two soft-input soft-output (SISO) decoders and interleavers/de-interleavers; the decoding latency is a function of the interleaver latency and the SISO decoding latency. In order to reduce the decoding latency and increase the throughput of theSISOdecoder,theslidingwindow(SW)approachhasbeenusedin [2]–[6].Acomprehensivestudyofthetradeoffsbetweenarea,energy, and throughput for different SW configurations was done in [3] for monolithic memory. A similar analysis of computational hardware andmemoryforSISO a posteriori probability (APP) algorithm using atilegraphwaspresentedin[6]. Most of the existing work on Turbo decoder architectures assume a monolithic memory structure. However, memory sub-banking is an effective means of achieving high throughput as discussed in [4], [5], [7].Theimplementationin[4]employedthreeuniquedataRAMswith read-modify-writeaccessinsinglecycle,whiletheonein[5]employed two dual-port data RAMs. The discussions in [4], [5] were limited to the case presented in this paper. In this work, we derive a systematic way of generating sub-banked structures using standard RAMs for high throughput SW-based SISO decoders. We evaluate the structures with respect to area, throughput, and energy consumption and provide a tradeoff analysis between the differentparameters.Themaincontributionsareasfollows. Derivation of the optimal single-port memory sub-banking structure (number and size of each sub-bank) that supports Manuscript received May 16, 2004; revised September 30, 2004. This work wassupportedbyCEINTatASU,andbyNSF-ITRunderGrant0325761. TheauthorsarewiththeDepartmentofElectricalEngineering,ArizonaState University,Tempe,AZ85287USA(e-mail:yuming.zhu@asu.edu). Digital Object Identifier 10.1109/TVLSI.2004.842937 very high throughput and low SISO decoding latency for a givenSWconfiguration(correspondingtoaspecificvalueof ). Study of the relationship between number of sub-banks, memory size, throughput and memory energy for a given SW configuration. Studyofthroughput,numberofsub-banks,memorysizeand memoryenergyfordifferentSWconfigurations. Suchacomprehensivestudyisintendedtoaidthedesignerinchoosing the optimal memory configuration given the constraints on memory size, number of sub-banks, throughput/decoding latency and energy consumption. Therestofthepaperisorganizedasfollows.SectionIIgivesabrief description of Turbo coders, maximum a posteriori (MAP) algorithm and application of SW on MAP-based SISO decoder. Section III de- rives the memory size and number of single-port and dual-port sub- banks for the proposed optimal sub-banking structure, followed by trade-offsbetweenthroughput,memorysize,numberofsub-banksand energy consumption for nonoptimal schemes. This section also pro- videscomparisonswithpreviouswork.SectionIVconcludesthepaper. II. SISO DECODER ARCHITECTURE A. Turbo Decoder Structure A Turbo encoder and an iterative decoder are shown in Fig. 1. The Turbo encoder consists of two recursive systematic convolutional (RSC)encodersandaninterleaver.TheTurbodecoderconsistsoftwo SISO decoders (corresponding to the two RSC encoders), an inter- leaver( )andade-interleaver( )placedbetweenthetwodecoders. The first SISO decoder generates soft outputs, which are interleaved andusedtoproduceanimprovedestimateofthe apriori probabilities of the information sequence for the second decoder. The output of the second SISO decoder is fed to the first SISO decoder through the de-interleaver. The SISO decoders are typically implemented using the MAP class of algorithms. The MAP algorithm estimates the most likely information bit in a coded sequence. B. Map Algorithm TheMAPalgorithmminimizesthesymbol(orbit)errorprobability. For each transmitted symbol, it generates a soft output in the form of APP based on the received sequence. The log-likelihood ratio can be computed as (1) for , where is the frame length, is the set of trellis transitions that are caused by input , , and are the start and end states of trellis transition . and are the path metrics, and is the branch metrics at time . It is shown in [1] that can be computed with a forward recursion and can be computed with a backward recursion. The function isdefinedas , where is a correction factor [8]. C. Sliding Window Approach In the standard MAP-based SISO decoder, the decoding latency is equaltothereceivedframesize.Forlargeframes,thememoryrequired 1063-8210/$20.00 © 2005 IEEE