494 IEEETRANSACTIONSONVERYLARGESCALEINTEGRATION(VLSI)SYSTEMS,VOL.13,NO.4,APRIL2005
Memory Sub-Banking Scheme for High Throughput
MAP-Based SISO Decoders
MayankTiwari,YumingZhu,andChaitaliChakrabarti
Abstract—The sliding window (SW) approach has been proposed as an
effective means of reducing the memory requirements as well as the de-
coding latency of the maximum a posteriori (MAP) based soft-input soft-
output (SISO) decoder in a Turbo decoder. In this paper, we present sub-
banked memory implementations (both single port and dual port) of the
SWSISOdecoderthatachieveshighthroughput,lowdecodinglatency,and
reduced memory energy consumption. Our contributions include deriva-
tion of the optimal memory sub-banked structure for different SW con-
figurations,studyoftherelationshipbetweenmemorysizeandenergycon-
sumptionfordifferentSWconfigurationsandstudyoftheeffectofnumber
of sub-banks on the throughput/decoding latency for a given SW configu-
ration.
Index Terms—High throughput, memory sub-banking, sliding window
(SW), tradeoffs, Turbo decoder.
I. INTRODUCTION
In recent years, Turbo codes have become very popular because of
their near-optimal performance [1], and have been adopted in mobile
standards such as 3GPP for IMT-2000 and wideband code division
multipleaccess(WCDMA).Thesuperiorperformanceisduetoacom-
bination of parallel concatenated coding, iterative decoding, large in-
terleaver size, etc. The large frame size of Turbo codes and the itera-
tive decoding process results in large decoding latency. The decoding
latency has to be reduced in order to make Turbo-based systems ac-
ceptableforreal-timevoicecommunicationandotherapplicationsthat
requireinstantdataprocessing,likeharddiskstorageandopticaltrans-
mission.
The Turbo decoder consists of two soft-input soft-output (SISO)
decoders and interleavers/de-interleavers; the decoding latency is a
function of the interleaver latency and the SISO decoding latency. In
order to reduce the decoding latency and increase the throughput of
theSISOdecoder,theslidingwindow(SW)approachhasbeenusedin
[2]–[6].Acomprehensivestudyofthetradeoffsbetweenarea,energy,
and throughput for different SW configurations was done in [3] for
monolithic memory. A similar analysis of computational hardware
andmemoryforSISO a posteriori probability (APP) algorithm using
atilegraphwaspresentedin[6].
Most of the existing work on Turbo decoder architectures assume
a monolithic memory structure. However, memory sub-banking is an
effective means of achieving high throughput as discussed in [4], [5],
[7].Theimplementationin[4]employedthreeuniquedataRAMswith
read-modify-writeaccessinsinglecycle,whiletheonein[5]employed
two dual-port data RAMs. The discussions in [4], [5] were limited to
the case presented in this paper.
In this work, we derive a systematic way of generating sub-banked
structures using standard RAMs for high throughput SW-based SISO
decoders. We evaluate the structures with respect to area, throughput,
and energy consumption and provide a tradeoff analysis between the
differentparameters.Themaincontributionsareasfollows.
• Derivation of the optimal single-port memory sub-banking
structure (number and size of each sub-bank) that supports
Manuscript received May 16, 2004; revised September 30, 2004. This work
wassupportedbyCEINTatASU,andbyNSF-ITRunderGrant0325761.
TheauthorsarewiththeDepartmentofElectricalEngineering,ArizonaState
University,Tempe,AZ85287USA(e-mail:yuming.zhu@asu.edu).
Digital Object Identifier 10.1109/TVLSI.2004.842937
very high throughput and low SISO decoding latency for a
givenSWconfiguration(correspondingtoaspecificvalueof
).
• Study of the relationship between number of sub-banks,
memory size, throughput and memory energy for a given
SW configuration.
• Studyofthroughput,numberofsub-banks,memorysizeand
memoryenergyfordifferentSWconfigurations.
Suchacomprehensivestudyisintendedtoaidthedesignerinchoosing
the optimal memory configuration given the constraints on memory
size, number of sub-banks, throughput/decoding latency and energy
consumption.
Therestofthepaperisorganizedasfollows.SectionIIgivesabrief
description of Turbo coders, maximum a posteriori (MAP) algorithm
and application of SW on MAP-based SISO decoder. Section III de-
rives the memory size and number of single-port and dual-port sub-
banks for the proposed optimal sub-banking structure, followed by
trade-offsbetweenthroughput,memorysize,numberofsub-banksand
energy consumption for nonoptimal schemes. This section also pro-
videscomparisonswithpreviouswork.SectionIVconcludesthepaper.
II. SISO DECODER ARCHITECTURE
A. Turbo Decoder Structure
A Turbo encoder and an iterative decoder are shown in Fig. 1. The
Turbo encoder consists of two recursive systematic convolutional
(RSC)encodersandaninterleaver.TheTurbodecoderconsistsoftwo
SISO decoders (corresponding to the two RSC encoders), an inter-
leaver( )andade-interleaver( )placedbetweenthetwodecoders.
The first SISO decoder generates soft outputs, which are interleaved
andusedtoproduceanimprovedestimateofthe apriori probabilities
of the information sequence for the second decoder. The output of
the second SISO decoder is fed to the first SISO decoder through the
de-interleaver. The SISO decoders are typically implemented using
the MAP class of algorithms. The MAP algorithm estimates the most
likely information bit in a coded sequence.
B. Map Algorithm
TheMAPalgorithmminimizesthesymbol(orbit)errorprobability.
For each transmitted symbol, it generates a soft output in the form of
APP based on the received sequence. The log-likelihood ratio
can be computed as
(1)
for , where is the frame length, is the set of trellis
transitions that are caused by input , , and
are the start and end states of trellis transition . and are
the path metrics, and is the branch metrics at time . It is shown
in [1] that can be computed with a forward recursion and can be
computed with a backward recursion. The function isdefinedas
, where is a correction
factor [8].
C. Sliding Window Approach
In the standard MAP-based SISO decoder, the decoding latency is
equaltothereceivedframesize.Forlargeframes,thememoryrequired
1063-8210/$20.00 © 2005 IEEE