This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 Low-Power Correlation for IEEE 802.16 OFDM Synchronization on FPGA Thinh H. Pham, Suhaib A. Fahmy, and Ian Vince McLoughlin Abstract— This brief compares the use of multiplierless and DSP slice-based cross-correlation for IEEE 802.16d orthogonal frequency division multiplexing (OFDM) timing synchronization on Xilinx Virtex- 6 and Spartan-6 field programmable gate arrays (FPGAs). The natural approach, given the availability of embedded DSP blocks on these FPGAs, would be to implement standard multiplier-based cross-correlation. However, this can consume a significant number of DSP blocks, which may not fit on low-power devices. Hence, we compare a DSP48E1 slice- based design to four different quantizations of multiplierless correlation in terms of resource utilization and power consumption. OFDM timing synchronization accuracy is evaluated for each system at different signal- to-noise ratios. Results show that even relatively coarse multiplierless coefficient quantization can yield accurate timing synchronization, and does so at high clock speeds. Multiplierless designs enjoy reduced power consumption over the DSP48E1 Slice-based design, and can be used where DSP Slice resources are insufficient, such as on low-power FPGA devices. Index Terms—Correlation, cognitive radio, field-programmable gate arrays (FPGA), IEEE 802.16 standards, orthogonal frequency division multiplexing (OFDM). I. I NTRODUCTION Orthogonal frequency division multiplexing (OFDM) is an effec- tive modulation technique used in both wired and wireless commu- nication systems. Particularly, thanks to the advantages of spectral efficiency and robustness to multipath fading, OFDM was specified for multiple applications in high bit-rate wireless transmission sys- tems such as wireless local area networks adopted by IEEE 802.11 and metropolitan area networks in IEEE 802.16d. However, OFDM performance is sensitive to receiver synchronization. Frequency offset causes inter-subcarrier interference, and errors in timing synchroniza- tion can lead to inter-symbol interference [1]. Therefore, synchroniza- tion is critical for good performance in OFDM systems. Much research has focused on improving OFDM synchronization performance and accuracy. Cyclic prefix (CP)-based methods were introduced [2]–[4] to determine frequency offset and symbol timing, but do not themselves find the start of a frame. To assist this, all OFDM frames begin with preamble symbols which can also be used to estimate the frequency offset [5]. This relies on the characteristic of a preamble symbol with two identical halves, using autocorrelation of the received signal, which can be computed iteratively at low cost and is robust to frequency offset. However, the metric used results in a plateau which leads to some uncertainty in determining the start of a frame. Work in [6]–[9] introduced modified timing metrics based on autocorrelation and the characteristic of specific preamble symbols to reduce the ambiguity of the plateau in finding the start of frame. However, the resulting autocorrelation operation is sensitive to additive white Gaussian noise (AWGN) and frequency selectivity. Kishore and Reddy [10] presented an algorithm that requires knowledge of the time domain preamble in the receiver to compute a cross-correlation metric between the known and received preamble Manuscript received February 2, 2012; revised June 6, 2012; accepted July 21, 2012. T. H. Pham is with Nanyang Technological University, 639798 Singapore, and also with the TUM-CREATE Centre for Electromobility, 138649 Singa- pore (e-mail: hung3@e.ntu.edu.sg). S. A. Fahmy and I. V. McLoughlin are with Nanyang Technological University, 639798 Singapore (e-mail: sfahmy@ntu.edu.sg; mcloughlin@ ntu.edu.sg). Digital Object Identifier 10.1109/TVLSI.2012.2210917 symbols. This can accurately determine the start of frame even at a low signal-to-noise ratio (SNR). However, the cross-correlation operation requires complex computation. Kim and Park [11] proposed an accurate synchronization method based upon the preamble symbol specified in IEEE 802.16d using two separate computation processes: first, autocorrelation is computed for coarse symbol time offset (STO) and fractional carrier frequency offset (CFO) estimation to obtain more reliable frequency synchronization and to reduce hardware cost; second, the fine STO and the integer CFO are estimated by performing cross-correlation between the received samples and known preamble. Autocorrelation-based techniques are preferred for implementation on FPGA because of their lower hardware costs. Dick and Harris [12] reported on the FPGA implementation of an OFDM trans- ceiver. They showed that FPGAs, with their highly parallel archi- tecture, are suitable for the implementation of OFDM transceivers. Wang et al. [13] also presented an FPGA implementation of an OFDM-WLAN synchronizer. In this brief, the timing synchronization is obtained by double autocorrelation based on short training symbols that allows a reduction in the hardware cost on FPGA. Fort et al. [14] compared the performance and complexity of FPGA implementation of autocorrelation and cross-correlation algorithms. Their results show that the accuracy of cross-correlation algorithms is better than that of autocorrelation algorithms. However, the accuracy of cross- correlation comes at significant hardware cost. Despite proposing a new cross-correlator implementation presented in [14] to reduce hard- ware cost compared to a classic cross-correlation approach, it is still at least five times more complex to implement than autocorrelation, because of the fact that several multipliers are required. Cross-correlation between received samples and a known preamble can achieve highly accurate timing synchronization; however, this requires significant resources. Multiplierless correlators for timing synchronization were introduced in [15], designed for IEEE 802.11a OFDM frames, based on expressing the correlator coefficients as sums of powers of 2 that only require shift and add operations. The authors identified a correlator that eliminates the need for multiplication, requiring only 26 additions/subtractions per output while maintaining similar synchronization accuracy as a multiplier- based implementation. OFDM is one of the main candidate modulation schemes for cognitive radios, and we believe FPGAs are an ideal plat- form owing to their flexibility [16]; hence optimizing this functionality is the key. Modern FPGAs contain various resources that can be used to implement cross-correlation. This brief presents the design of several correlators for timing synchronization with preamble symbols based upon IEEE 802.16d. We compare designs using specialized digital signal processing (DSP) Slices to a multi- plierless approach on Xilinx Virtex-6 and Spartan-6 FPGA devices. Attempting to implement correlation on FPGAs without consider- ing and designing the underlying architecture results in a highly inefficient implementation. In this brief, we show optimized FPGA designs, built to fit the FPGA architecture, and evaluate performance, timing synchronization accuracy, resource utilization, and power consumption, to understand whether a multiplier-based mapping is beneficial when using modern devices. II. I MPLEMENTATION OF CORRELATORS The downlink preamble in IEEE 802.16d [17] contains two consec- utive OFDM symbols, as shown in Fig. 1. The short symbol consists of four identical 64-sample fragments in time, preceded by a CP. This is followed by the long symbol which contains two repetitions of a 128-sample fragment and a CP [17]. 1063–8210/$31.00 © 2012 IEEE