IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 3, JUNE 2002 279 Architectural Strategies for Low-Power VLSI Turbo Decoders Guido Masera, Marco Mazza, Gianluca Piccinini, Fabrizio Viglione, and Maurizio Zamboni Abstract—The use of “turbo codes” has been proposed for sev- eral applications, including the development of wireless systems, where highly reliable transmission is required at very low signal-to- noise ratios (SNR). The problem of extracting the best coding gains from these kind of codes has been deeply investigated in the last years. Also the hardware implementation of turbo codes is a very challenging topic, mainly due to the iterative nature of the decoding process, which demands an operating frequency much higher than the data rate; in the case of wireless applications, the design con- straints became even more strict due to the low-cost and low-power requirements. This paper first presents a new architecture for the decoder core with improved area and power dissipation properties; then parti- tioning techniques are proposed to reduce the power consumption of the decoder memories. It is proven that most of the power is dissipated by the large RAM units required by the decoder, so the described technique is very efficient: an average power saving of 70% with an area overhead of 23% has been obtained on a set of analyzed architectures. Index Terms—High performance, low-power design, low voltage, memory, turbo-decoding, partitioning, very large scale integration (VLSI). I. INTRODUCTION C ONVOLUTIONAL concatenated codes with iterative de- coding (“Turbo codes” [1]) have been proved as one of the most powerful solution for high coding gain applications. A concatenated encoder is composed of two or more recur- sive and systematic convolutional encoders. Interleaving blocks are placed among single encoders and work as memories in which data are read and written in different orders. There are two primary schemes of connection: parallel concatenated con- volutional codes (PCCC, the turbo code originally proposed in [1]) and serially concatenated convolutional codes (SCCC); in Fig. 1 the general structures of the two schemes are reported, where and are convolutional encoders, is the inter- leaver, is the data stream to be encoded and are the encoded streams. SCCCs have been shown to yield performance comparable, and in some cases superior, to PCCC turbo codes [2]. In [3] more schemes of connection are presented. The decoder is composed of a concatenation of interleavers and soft decoders [3] (soft-in soft-out, SISO) which produce in- dexes of reliability (soft information) related to the input and output symbol streams of each encoder. The whole decoder op- erates in an iterative way and the decoding process is stopped when the wished level of reliability is reached. Manuscript received July 27, 2000; revised April 6, 2001. The authors are with the Dipartimento di Elettronica, Politecnico di Torino, Corso Duca degli Abruzzi 24-10129 Torino, Italy (e-mail: guido@ vlsilab01.polito.it; mazza@vlsilab01.polito.it; gianluca@vlsilab01.polito.it; viglione@vlsilab01.polito.it; maurizio@vlsilab01.polito.it). Publisher Item Identifier S 1063-8210(02)04168-9. Fig. 1. Convolutional encoders with interleavers: (a) parallel (PCCC); (b) serial (SCCC). Recently, turbo codes have been proposed for wireless com- munications, such as the universal mobile telecommunication system (UMTS) [4], [5] for the third generation of mobile com- munications; other applications of turbo codes are in standard protocols for disk drivers [6] and in satellite and deep-space communications [7], such as in the European Space Agency (ESA) mission Rosetta. The design of low-cost and low-power turbo decoders is very important in wireless communication systems. Several papers have been published on the subject of low-power turbo decoder implementation. In [8], [9], and [10], different stop criteria are proposed to stop the decoding iterations through online monitoring of some different blends of performance related quantities. In [11], data flow transformations are described as a method to reduce both the size and the number of transfers of storage blocks at the cost of some extra calculations. With the target of more effectively using the memories, algorithm transformations have been analyzed. That work obtained a power saving of about 60%, a speed-up of 70% with an area overhead of 20%, although these results are only referred to the SISO units and not to the entire architecture of the decoder. In [12] and [13] analog imple- mentations of turbo decoders are proposed as low-power solu- tions. However, a systematic study about optimization of power dissipation of the entire decoder has not been performed yet. In this paper, an analysis of the distribution of the power con- sumption among the constitutive blocks of the decoder is pre- sented: as a result of this study, it is shown that memory blocks are the most critical units of the decoder in terms of power con- sumption. The paper also proposes two architectural solutions for low-power implementation: the first solution is a new archi- tecture for the SISO unit, which roughly offers a 20% reduction 1063-8210/02$17.00 © 2002 IEEE