Ultra-Fast-Scalable BCH Decoder with Efficient-Extended Fast Chien Search Hans Kristian, Hernando Wahyono, Kiki Rizki School of Electrical and Informatics Engineering Bandung Institute of Technology (ITB) Bandung, Indonesia hernando.wahyono@gmail.com Abstract—In this paper, we introduced new methods in implementing ultra-fast-efficient BCH decoder that frequently used in many applications. A Reformulated inversionless- Berlekamp-Massey algorithm is adopted in order to eliminate the finite-field inverter and to reduce the hardware complexity. Furthermore, we proposed a Direct reformulated-inversionless Berlekamp-Massey algorithm (DriBM). While in the Chien Search stage, the Constant-Factor Multiplication-Free Matrix transform is also introduced to avoid expensiveness which significantly reduce the area and critical path. Moreover, we also developed Extended Fast Chien Search algorithm which significantly reduce computation complexity and the area by nearly 33% compared to Constant-Factor MFTM. Using our proposed design, we design a BCH(15,7) decoder which can reach speed up to 2.2 GHz with total area of is 8170μm2 using 0.18μm CMOS standard cell technology. The merits of the proposed algorithms and architecture are very efficient and fast. The implementation of the proposed BCH decoder architecture is also scalable to higher n block lengths and t number of correctable error, by using the same concept as we design BCH(63,51) using the same concept as BCH(15,7). In addition to the parallel BCH Decoder, we also design an area efficient parallel GF multiplier and squarer which minimized the number of logic gates. This design has been implemented and verified on Altera DE2 FPGA using codeword with various error positions and weight (0-2 guaranteed error correction). Due to its low complexity, it is suitable for VLSI implementation and also provide excellent tradeoffs between the correcting capacity, speed and area penalties. Keywords—BCH Decoder, VLSI Architecture, Ultra-Fast- Scalable-Efficient. I. INTRODUCTION TO BCH BCH (Bose-Chaudhuri-Hocquenghem) is a common algorithm to correct a small bit error with a wide range of applications in digital communications and storage. This paper proposed new implementation of BCH Decoder needed by systems such as storage devices (CD, DVD), wireless or mobile communications, Digital Television (DVB), etc. above which considers the excellent tradeoffs between correcting capacity, speed & area penalties. BCH codes are a class of powerful random-error- correcting cyclic codes[1-3]. For any positive integer m  3 and the desired error correction capability t <2 m-1 , therefore there will be a binary BCH code C bch (n,k) with the following properties: Number of parity bits n-k  mt Minimum hamming distance d min  2t + 1 Error-correction capability t errors BCH codes are able to correct any error pattern of size t or less, in a code vector of length n, n = 2m  1. In order to construct a BCH codeword, a polynomial generator g(x) is derived. This codeword with lengths of n is constructed by calculating the parity bits from the message bits. The common decoding procedures for binary BCH:  Calculate the syndrome values from the received word  Determine the error-locator polynomial  Find the roots of error-locator-polynomials and then correct errors II. PROPOSED BCH DESIGN A. Proposed : Efficient Parallel Syndrome Calculation with Matrices Representation Syndrome Calculator In this paper, matrices representation method is proposed to compute syndrome which provide a simple and scalable approach. To compute syndrome, the parity matrix, are defined, as follows [2] S= r. H T (1) with the matrix H, (2) r : the received code row matrix of length n H : the parity matrix  : a primitive element in GF(2 m ) The matrix multiplication will result in 2t row matrix with the i-th component of the syndrome, as follows (3) For 1  i  2t. The i-th syndrome is represented, as can be seen from the matrix multiplication, (4) Code length n = 2 m -1 _____________________________________ 978-1-4244-5539-3/10/$26.00 ©2010 IEEE 338