From Application to ASIP-based FPGA prototype: a Case Study on Turbo Decoding Olivier Muller, Amer Baghdadi, Michel Jézéquel Electronics Department, TELECOM Bretagne, Technopôle Brest Iroise, 29238 Brest, France {olivier.muller, amer.baghdadi, michel.jezequel}@telecom-bretagne.eu Abstract ASIP-based implementations constitute a key trend in SoC design enabling optimal tradeoffs between performance and flexibility. This paper details a case study of an ASIP-based implementation of a high throughput flexible turbo decoder. It introduces turbo decoding application and proposes an Application-Specific Instruction-set Processor with SIMD architecture, a specialized and extensible instruction-set, and 6-stages pipeline control. The proposed ASIP is developed in LISA language and generated automatically using the Processor Designer framework from CoWare. The paper illustrates how the automatic generated RTL code of the ASIP can be adapted for a rapid prototyping on FPGA reconfigurable logic and memory resources. For a Xilinx Virtex-II Pro FPGA, a single ASIP prototype occupies 68% of FPGA resources and achieves a 6.3 Mbit/s throughput when decoding a double binary turbo code with 5 iterations. 1. Introduction Applications in the field of digital communications are becoming more and more diversified and complex. This trend is driven by the emergence of turbo-communications which generalize the principle of iterative processing introduced by the turbo-codes [1]. Implementation of turbo- communication systems -such as channel decoding, equalization, demodulation, synchronization or MIMO systems- is becoming crucial to reach the nowadays performance requirements in terms of transmission quality (e.g. throughput and error rates). In addition to the continuously developing new standards and applications in digital communication domain, the severe time-to-market constraints make inevitably resorting to new design methodologies and the proposal of a flexible and efficient turbo communication platform. Good tradeoffs between flexibility and performance can be achieved by the use of programmable/configurable processors rather than ASICs. Concerning turbo decoding, several turbo-decoder implementations have been proposed these last few years. Some of these implementations succeeded in achieving high throughput for specific standards with a fully dedicated architecture. In [2], the ASIC implementation enables high performance turbo decoding dedicated to 3GPP standards. In [3], a new class of turbo codes more suitable for high throughput implementation is proposed. However, such implementations do not take into account flexibility issues. Unlike these implementations, others include software and/or reconfigurable parts to achieve the required flexibility while achieving lower throughput [4]. In fact, the concept of the application-specific instruction set processor (ASIP) [9] constitutes the appropriate solution for fulfilling the flexibility and performance constraints of emerging and future applications as shown in [10] and [11]. Despite the appropriateness of ASIP concept, the execution speed associated to ASIP’s instruction set simulators (ISS) is too slow to validate a complete system, especially in the case of digital communication applications, which imply very long error rate simulations. Executing these simulations in a reasonable time imposes to run them on a hardware prototype. Therefore, in this context, system validation requires a proper prototyping flow. In this work, we present a flexible and high performance ASIP model for turbo decoding and propose a validation flow of this ASIP from its high level description to the FPGA prototype. The rest of the paper is organized as follows. The next section presents the turbo decoding algorithm to better understand subsequent sections. Section 3 details the proposed ASIP architecture model for turbo decoding. Section 4 describes the flow we proposed to verify and prototype our processor. Then, this flow is illustrated in section 5 through a FPGA prototyping on a development board. Finally, section 6 summarizes the results obtained and concludes the paper. 2. Convolutional Turbo Decoding In iterative decoding algorithms, the underlying turbo principle relies on extrinsic information exchanges and iterative processing between different Soft Input Soft Output (SISO) modules. Using input information and a priori extrinsic information, each SISO module computes a posteriori extrinsic information. This a posteriori extrinsic information becomes the a priori information for the other modules and is exchanged via interleaving and deinterleaving processes. For convolutional turbo codes [1], classically constructed with two convolutional component codes, the SISO modules process the BCJR or Forward-backward algorithm [5], which is the optimal algorithm for the maximum a posteriori (MAP) decoding of convolutional codes (Figure 1).