Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 2, Issue. 4, April 2013, pg.146 – 154 RESEARCH ARTICLE © 2013, IJCSMC All Rights Reserved 146 Hardware-Optimized Lattice Reduction Algorithm for WiMax/LTE MIMO Detection using VLSI R.Ragumadhavan 1 1 Assistant Professor, Department of Electronics and Communication Engineering, PSNA College of Engineering and Technology, Dindigul, Tamilnadu, India 1 raguece85@gmail.com Abstract— This paper presents the first ASIC implementation of an LR algorithm which achieves ML diversity. The VLSI implementation is based on a novel hardware-optimized LLL algorithm that has 70% lower complexity than the traditional complex LLL algorithm. This reduction is achieved by replacing all the computationally intensive CLLL operations (multiplication, division and square root) with low-complexity additions and comparisons. The VLSI implementation uses a pipelined architecture that produces an LR- reduced matrix every 40 cycles, which is a 60% reduction compared to current implementations. The proposed design was synthesized in both 130m and 65nm CMOS resulting in clock speeds of 332MHz and 833MHz, respectively. The 65nm result is a 4X improvement over the fastest LR implementation to date. The proposed LR implementation is able to sustain a throughput of 2Gbps, thus achieving the high data rates required by future standards such as IEEE 802.16m (WiMAX) and LTE-Advanced. Key Terms: - WiMax; MIMO; Lattice; LTE I. INTRODUCTION Recently, lattice-reduction (LR) has been proposed in conjunction with MIMO detection schemes to improve their performance via transforming the system model into an equivalent one with a more orthogonal channel matrix, thereby lowering the likelihood of detection errors due to noise perturbations [1]. The LLL algorithm (due to Lenstra, Lenstra and Lovasz) [2] is the most commonly used LR method and has been shown to achieve ML diversity for low- complexity detectors [3] and significantly improve the performance of more complex detectors such as K-Best [4]. A more efficient, complex-valued extension to LLL (known as CLLL) was developed in [5]. However, the VLSI implementation of CLLL remains problematic due to its computationally intensive operations and its non- deterministic complexity. Currently, only small number VLSI implementations of LR have been reported in the literature, such as [6], [7] and [8]. Each of these designs was implemented on an FPGA platform. The Clarkson algorithm (CA), presented in [8], is a variant of CLLL that achieves a lower complexity by modifying the CLLL reduction criterion. However, CA, like CLLL, has the drawback of variable complexity and it also relies on computationally intensive operations such as division and multiplication. Another complex LR algorithm known as Seysen’s algorithm (SA) was presented in [9], however, we show that SA has a much higher computational complexity than both CA and CLLL. Thus, SA is even more problematic from an implementation point of view. Therefore to achieve an efficient and high-throughput VLSI implementation of LR, there is a need for an algorithm with significantly reduced and deterministic complexity. In this paper we propose the design and ASIC implementation of a modified CLLL algorithm which achieves a 70% reduction in complexity over existing LR algorithms (including CLLL [5], CA [8], and SA [9]) with effectively the same BER performance. Our algorithm, which we named HOLLL (Hardware-Optimized LLL), eliminates the need for all computationally intensive LLL operations (such as division and multiplication)