188 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 2, FEBRUARY 2008 System Architecture and Implementation of MIMO Sphere Decoders on FPGA Xinming Huang, Member, IEEE, Cao Liang, Student Member, IEEE, and Jing Ma, Member, IEEE Abstract—Multiple-input–multiple-output (MIMO) systems use multiple antennas in both transmitter and receiver ends for higher spectrum efficiency. The hardware implementation of MIMO detection becomes a challenging task as the computa- tional complexity increases. This paper presents the architectures and implementations of two typical sphere decoding algo- rithms, including the Viterbo–Boutros (VB) algorithm and the Schnorr–Euchner (SE) algorithm. Hardware/software codesign technique is applied to partition the decoding algorithm on a single field-programmable gate array (FPGA) device. Three levels of parallelism are explored to improve the decoding rate: the concurrent execution of the channel matrix preprocessing on an embedded processor and the decoding functions on customized hardware modules, the parallel decoding of real/imaginary parts for complex constellation, and the concurrent execution of mul- tiple steps during the closest lattice point search. The decoders for a4 4 MIMO system with 16-QAM modulation are prototyped on a Xilinx XC2VP30 FPGA device with a MicroBlaze soft core processor. The hardware prototypes of the SE and VB algorithms show that they support up to 81.5 and 36.1 Mb/s data rates at 20 dB signal-to-noise ratio, which are about 22 and 97 times faster than their respective implementations in a digital signal processor. Index Terms—Field-programmable gate array (FPGA), lattice point search, multiple-input–multiple-output (MIMO) detection, parallel structure, sphere decoding, system-on-chip (SoC). I. INTRODUCTION W IRELESS communication systems are dense composi- tions of signal processing and VLSI technologies. With the ever increasing demand of higher data rate and better quality of service, VLSI design and implementation method for wire- less communications becomes more challenging, which urges researchers to provide new architectures and efficient imple- mentations to meet high performance requirements. In recent years, the interests in multiple-input–multiple-output (MIMO) systems have exploded. It is well known that MIMO systems are able to increase system capacity and improve communica- tion reliability [1]–[3]. The applications of MIMO technology have emerged at the forefront of the developing standards for next-generation mobile communications and wireless networks. Combined with the orthogonal frequency-division multiplexing Manuscript received May 2, 2006; revised June 13, 2007. This work was sup- ported in part by the National Science Foundation under Grant EPS-0346411. X. Huang and C. Liang are with the Department of Electrical and Com- puter Engineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA (e-mail: xhuang@ece.wpi.edu; cliang@ece.wpi.edu). J. Ma was with the Department of Electrical Engineering, University of New Orleans, New Orleans, LA 70148 USA. She is now with The MathWorks Inc., Natick, MA 01760 USA (e-mail: jing.ma@mathworks.com). Digital Object Identifier 10.1109/TVLSI.2007.912042 (OFDM) technique, MIMO is proposed to be incorporated into the fourth generation (4G) mobile communications system ar- chitectures to enhance voice and data transmissions. Recent ini- tiatives for standardization of future MIMO communication sys- tems including UMTS (3GPP Release 7) [4], IEEE 802.11n wireless LAN [5], and IEEE 802.16e WiMax [6] reflect the im- portance of MIMO techniques. The information theory for MIMO systems has been well studied on performance parameters such as data rate and bit error rate (BER) [7]. The layered space-time receiver struc- tures and coding schemes have allowed the MIMO systems to approach the theoretical capacities on a multiantenna channel [8]. On the receiver end, one of the key functions is to per- form channel decoding to recover the original data stream corresponding to each of the transmitted antennas from the receiving signal vector and estimated channel information. Both lattice theory and coding theory are applied in the de- sign of MIMO detection algorithms. In a multiple antenna channel environment, each of the transmitted signal vectors is aligned on the modulated constellation points. Therefore, a multilayered lattice is formed with a set of finite points and the MIMO detection is essentially an algorithm to search for the closest lattice point to the received vector. There are two typical classes of comprehensive search algorithms for a lattice without an exploitable structure. One is the Pohst strategy that examines lattice points lying inside a hypersphere [9], [10]. The lattice decoding algorithm developed by Viterbo and Boutros is based on the Pohst strategy [11]. Another class of lattice search strategy is suggested by Schnorr and Euchner [12], based on examining the points inside the aforementioned hypersphere in zig zag order of lattice layers with nondecreasing distance from the received signal vector. A representative lattice decoding algorithm based on Schnorr–Euchner (SE) strategy is applied by Agrell et al. [13]. Both lattice search algorithms solve the maximum-likelihood (ML) detection problem. Both algorithms are considered the most promising approaches for MIMO detection, and are also commonly referred as sphere decoders since the algorithms search for the closest lattice point within a hypersphere. Due to the complexity of the lattice decoding algorithms and the high data dependency among the decoding procedures, the MIMO decoders are generally implemented on digital signal processors (DSPs), such as the Bell Labs layered space-time (BLAST) system [14], [15]. Because it does not support par- allel computation, the speed of the DSP implementation is often limited, especially as the number of antennas increases. The VLSI architectures of MIMO systems have been investigated recently. It is a challenging task to reduce the complexity of the 1063-8210/$25.00 © 2007 IEEE Authorized licensed use limited to: Worcester Polytechnic Institute. Downloaded on November 12, 2008 at 13:24 from IEEE Xplore. Restrictions apply.