(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 3, 2017 Area and Energy Efﬁcient Viterbi Accelerator for Embedded Processor Datapaths Abdul Rehman Buzdar * , Liguo Sun * , Muhammad Waqar Azhar ‡ , Muhammad Imran Khan †§ , Rao Kashif † * Department of Electronic Engineering and Information Science † Micro/Nano Electronic System Integration R & D Center (MESIC) University of Science and Technology of China (USTC), Hefei, China ‡ Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden § Department of Electronics Engineering, University of Engineering and Technology Taxila, Pakistan Abstract—Viterbi algorithm is widely used in communication systems to efﬁciently decode the convolutional codes. This algo- rithm is used in many applications including cellular and satel- lite communication systems. Moreover, Serializer-deserializers (SERDESs) having critical latency constraint also use viterbi algorithm for hardware implementation. We present the inte- gration of a mixed hardware/software viterbi accelerator unit with an embedded processor datapath to enhance the processor performance in terms of execution time and energy efﬁciency. Later we investigate the performance of viterbi accelerated em- bedded processor datapath in terms of execution time and energy efﬁciency. Our evaluation shows that the viterbi accelerated Microblaze soft-core embedded processor datapath is three times more cycle and energy efﬁcient than a datapath lacking a viterbi accelerator unit. This acceleration is achieved at the cost of some area overhead. Keywords—Viterbi decoder; Codesign; FPGA; MicroBlaze; Em- bedded Processor I. I NTRODUCTION Channel coding is used in wireless communication systems for reliable data transfer over noise prone communication channels. Various forward error correction (FEC) schemes e.g. Low-density parity-check (LDPC), Reed Solomon, Viterbi and Turbo codes are used to meet the growing need to improve the spectrum efﬁciency [1], [2], [3], [4], [5]. In FEC schemes the encoding of data is done using convolutional encoding and at the receiver end the decoding process is done by viterbi or turbo decoders [21-31]. The viterbi decoder is suitable in wireless communication systems in which the transmitted signals are corrupted by additive white Gaussian noise [6]. The decoding process in FEC schemes is computationally intensive and power hungry. The hand held devices are battery powered, so they must be energy efﬁcient. The customized hardware implementation of these FEC decoders are perfor- mance and power efﬁcient but lacks ﬂexibility. As the wireless standards evolve with time, so the hardware needs to be ﬂexible. The viterbi decoder can be implemented in software and executed on an embedded processor but it will require a lot of clock cycles. The viterbi decoder can be implemented more efﬁciently in dedicated hardware which will require few clock cycles at the cost of ﬂexibility. The high speed communication systems today requires fast data rates which can only be delivered using dedicated hardware solutions. Different hardware modules like USB, Ethernet, TCP/IP, CRC and CAN protocol are included in modern embedded processors [7], [8], [9] to speedup certain parts of application in areas like signal processing, communication and control systems. This provides effective use of viterbi accelerator in programming systems where a series of viterbi decoding is required to be computed. II. CONVOLUTIONAL ENCODING AND VITERBI DECODING Convolutional encoding of data is implemented with a shift register having K - 1 memory elements and cascaded network of exclusive-or gates. Here K is the constraint length and having 2 K+1 encoder states. The shift register is a chain of ﬂip-ﬂops and the output of nth ﬂip-ﬂop goes as input into the (n+1)th ﬂip-ﬂop. The data in the registers is shifted to the next register and the value in the last register gets discarded. The combinational logic consisting of exclusive-or gates is used to perform modulo-2 addition. The encoder outputs n symbols using generator polynomials and values in the shift register. Fig. 1 shows a convolutional encoder for K =3, R =1/2 and generator polynomials G1 = (1, 1, 1) and G2 = (1, 0, 1). The code rate is the ratio of the number of input bits to the number of output bits (R = m/n). The reason for the convolutional codes being efﬁcient compared to block codes is the fact that every input bit has an impact on K successive output symbols [10]. The value of K is directly proportional to the code complexity and error correction capability. The decoder complexity and memory requirements increases with increasing K. Figure 1: Convolutional Encoder general architecture. Trellis diagram is used to visualize the state transitions of an encoder, as shown in Fig. 2. The black lines represent input bit 0 and the dotted lines represent input bit 1. The trellis path of input sequence is represented by the red lines. The basic concept is that the valid path through trellis diagram is www.ijacsa.thesai.org 402 | P a g e