Fast Inverse Square Root Based Matrix Inverse For MIMO-LTE Systems Chinmaya Mahapatra, Saad Mahboob, Victor C.M. Leung Dept. of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada Chinmaya@ece.ubc.ca, smahboob@ece.ubc.ca Thanos Stouraitis Dept. of Electrical and Computer Engineering, University of Patras, Rio, Greece thanos@ upatras.gr Abstract—This paper addresses the designing of a low complexity and high speed matrix inversion algorithm using fast inverse square root based on QR-decomposition and systolic array architecture. Matrix operations are the most costly computational module within MIMO-LTE receivers . We have demonstrated a novel approach of matrix inverse to reduce the MIMO receiver module cost in terms of latency and complexity. The cost is reduced by implementing a 4x4 matrix inverse in Xilinx Virtex-6 FPGA by optimizing the module for speed and power by pipelining and achieving a better throughput. The results are compared with state of art techniques of CORDIC based squared givens rotation. Keywords-MIMO LTE; Fast inverse square root; QR decomposition; Systolic array; Xilinx virtex6 FPGA; Pipelining, CORDIC I. INTRODUCTION Multi Input Multi Output (MIMO) -Long Term Evolution (LTE) [1], [2] is the one of new technologies in wireless communications to improve bandwidth utilization efficiency. The access mode of multi-user MIMO LTE using a popular digital schemes Orthogonal Frequency Division Multiple Access (OFDMA) for downlink and Sub-Carrier Frequency Division Multiple Access (SC- FDMA) for uplink which provides high data rate in wireless environments. Multiple access channels are achieved in OFDMA by assigning narrow sub-bands, each narrow sub-band has flat frequency response and frequency selective channel is converted into a lot of flat- fading sub-channels. This can achieve a higher MIMO spectral efficiency averaging interferences from neighboring cells and less affected to various kinds of impulse noise. Most of the channel estimation process needs to invert a matrix which is either the channel state information or a nonlinear function of it. Increasing the number of transmitter and receiver antennas provides a higher data rate but the dimension of matrix function increases. Thus we require fast approaches to obtain matrix inverse. In this paper, we will be presenting a matrix inversion technique using fast inverse square root based givens rotation and will optimize it for speed and power. The sections are organized as follows: Section II gives a brief overview of various matrix-inversion algorithms along with their demerits. Section III provides a brief description of the Matrix inversion approach using fast inverse square root based givens rotation, QR decomposition and systolic array. Section IV describes FPGA implementation and analysis. Section V outlines the error analysis. Finally, section VI concludes the project followed by references. II. MATRIX INVERSION ALGORITHMS Methods for computing matrix inversion can be divided into two categories: iterative and direct. Iterative methods require an initial estimate of the solution and subsequent updates based on calculation of the previous estimate error. Normally, these iterative methods involve high-complexity sequential matrix computations and are not particularly suitable for real-time implementation. QRD is an attractive approach for matrix inversion due to its well known numerical stability [3]. Several algorithms and architectures have been proposed for the computation of QRD-based matrix inversion; those which employ the Gram-Schmidt [4] and conventional Givens rotations (CGR) algorithms are disadvantaged from an implementation perspective as they require high-complexity square-root operations. Whilst the shift-and-add processing nature of CORDIC-based matrix inversion [5] offers low complexity hardware implementation, its inherent latency can preclude it from high-performance applications. Squared Givens rotations (SGR) offer square-root free processing and a number of SGR-based matrix inversion architectures have been proposed [6], [7], [8]. We propose an approach explained in sections below that replaces the square root and division operation in matrix inverse by shift and multiply operations. Thus it reduces latency and increases speed as compared to other architectures. III. MATRIX INVERSION USING QR DECOMPOSITION AND SYSTOLIC ARRAY In this paper we present the results for inverting a matrix of size 4× 4. The same idea and a slight modification in hardware can be used for larger matrix sizes. In the hardware design, we are using QR decomposition and systolic arrays [7]. Α=QR (1) Let A be n× p matrix of full rank p. The QR decomposition is decomposing matrix A to a triangular matrix Rp× p and an orthogonal matrix Q using plane rotations. This work was supported by the Canadian Natural Sciences and Engineering Research Council through grant STPGP 396756 2012 International Conference on Control Engineering and Communication Technology 978-0-7695-4881-4/12 $26.00 © 2012 IEEE DOI 10.1109/ICCECT.2012.253 321 2012 International Conference on Control Engineering and Communication Technology 978-0-7695-4881-4/12 $26.00 © 2012 IEEE DOI 10.1109/ICCECT.2012.253 321