188 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 2, FEBRUARY 2008
System Architecture and Implementation of MIMO
Sphere Decoders on FPGA
Xinming Huang, Member, IEEE, Cao Liang, Student Member, IEEE, and Jing Ma, Member, IEEE
Abstract—Multiple-input–multiple-output (MIMO) systems
use multiple antennas in both transmitter and receiver ends for
higher spectrum efficiency. The hardware implementation of
MIMO detection becomes a challenging task as the computa-
tional complexity increases. This paper presents the architectures
and implementations of two typical sphere decoding algo-
rithms, including the Viterbo–Boutros (VB) algorithm and the
Schnorr–Euchner (SE) algorithm. Hardware/software codesign
technique is applied to partition the decoding algorithm on a
single field-programmable gate array (FPGA) device. Three levels
of parallelism are explored to improve the decoding rate: the
concurrent execution of the channel matrix preprocessing on an
embedded processor and the decoding functions on customized
hardware modules, the parallel decoding of real/imaginary parts
for complex constellation, and the concurrent execution of mul-
tiple steps during the closest lattice point search. The decoders for
a4 4 MIMO system with 16-QAM modulation are prototyped
on a Xilinx XC2VP30 FPGA device with a MicroBlaze soft core
processor. The hardware prototypes of the SE and VB algorithms
show that they support up to 81.5 and 36.1 Mb/s data rates at
20 dB signal-to-noise ratio, which are about 22 and 97 times faster
than their respective implementations in a digital signal processor.
Index Terms—Field-programmable gate array (FPGA), lattice
point search, multiple-input–multiple-output (MIMO) detection,
parallel structure, sphere decoding, system-on-chip (SoC).
I. INTRODUCTION
W
IRELESS communication systems are dense composi-
tions of signal processing and VLSI technologies. With
the ever increasing demand of higher data rate and better quality
of service, VLSI design and implementation method for wire-
less communications becomes more challenging, which urges
researchers to provide new architectures and efficient imple-
mentations to meet high performance requirements. In recent
years, the interests in multiple-input–multiple-output (MIMO)
systems have exploded. It is well known that MIMO systems
are able to increase system capacity and improve communica-
tion reliability [1]–[3]. The applications of MIMO technology
have emerged at the forefront of the developing standards for
next-generation mobile communications and wireless networks.
Combined with the orthogonal frequency-division multiplexing
Manuscript received May 2, 2006; revised June 13, 2007. This work was sup-
ported in part by the National Science Foundation under Grant EPS-0346411.
X. Huang and C. Liang are with the Department of Electrical and Com-
puter Engineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA
(e-mail: xhuang@ece.wpi.edu; cliang@ece.wpi.edu).
J. Ma was with the Department of Electrical Engineering, University of New
Orleans, New Orleans, LA 70148 USA. She is now with The MathWorks Inc.,
Natick, MA 01760 USA (e-mail: jing.ma@mathworks.com).
Digital Object Identifier 10.1109/TVLSI.2007.912042
(OFDM) technique, MIMO is proposed to be incorporated into
the fourth generation (4G) mobile communications system ar-
chitectures to enhance voice and data transmissions. Recent ini-
tiatives for standardization of future MIMO communication sys-
tems including UMTS (3GPP Release 7) [4], IEEE 802.11n
wireless LAN [5], and IEEE 802.16e WiMax [6] reflect the im-
portance of MIMO techniques.
The information theory for MIMO systems has been well
studied on performance parameters such as data rate and bit
error rate (BER) [7]. The layered space-time receiver struc-
tures and coding schemes have allowed the MIMO systems to
approach the theoretical capacities on a multiantenna channel
[8]. On the receiver end, one of the key functions is to per-
form channel decoding to recover the original data stream
corresponding to each of the transmitted antennas from the
receiving signal vector and estimated channel information.
Both lattice theory and coding theory are applied in the de-
sign of MIMO detection algorithms. In a multiple antenna
channel environment, each of the transmitted signal vectors
is aligned on the modulated constellation points. Therefore, a
multilayered lattice is formed with a set of finite points and
the MIMO detection is essentially an algorithm to search for
the closest lattice point to the received vector. There are two
typical classes of comprehensive search algorithms for a lattice
without an exploitable structure. One is the Pohst strategy that
examines lattice points lying inside a hypersphere [9], [10]. The
lattice decoding algorithm developed by Viterbo and Boutros is
based on the Pohst strategy [11]. Another class of lattice search
strategy is suggested by Schnorr and Euchner [12], based on
examining the points inside the aforementioned hypersphere in
zig zag order of lattice layers with nondecreasing distance from
the received signal vector. A representative lattice decoding
algorithm based on Schnorr–Euchner (SE) strategy is applied
by Agrell et al. [13]. Both lattice search algorithms solve the
maximum-likelihood (ML) detection problem. Both algorithms
are considered the most promising approaches for MIMO
detection, and are also commonly referred as sphere decoders
since the algorithms search for the closest lattice point within a
hypersphere.
Due to the complexity of the lattice decoding algorithms and
the high data dependency among the decoding procedures, the
MIMO decoders are generally implemented on digital signal
processors (DSPs), such as the Bell Labs layered space-time
(BLAST) system [14], [15]. Because it does not support par-
allel computation, the speed of the DSP implementation is often
limited, especially as the number of antennas increases. The
VLSI architectures of MIMO systems have been investigated
recently. It is a challenging task to reduce the complexity of the
1063-8210/$25.00 © 2007 IEEE
Authorized licensed use limited to: Worcester Polytechnic Institute. Downloaded on November 12, 2008 at 13:24 from IEEE Xplore. Restrictions apply.