Journal of Signal Processing Systems
https://doi.org/10.1007/s11265-018-1387-2
Parallel Memory Accessing for FFT Architectures
V. Kitsakis
1
· K. Nakos
1
· D. Reisis
1
· N. Vlassopoulos
1
Received: 1 December 2017 / Revised: 8 March 2018 / Accepted: 30 May 2018
© Springer Science+Business Media, LLC, part of Springer Nature 2018
Abstract
The current paper introduces an efficient technique for parallel data addressing in FFT architectures performing in-place
computations. The novel addressing organization provides parallel load and store of the data involved in radix-r butterfly
computations and leads to an efficient architecture when r is a power of 2. The addressing scheme is based on a permutation
of the FFT data, which leads to the improvement of the address generating circuit and the butterfly processor control. More-
over, the proposed technique is suitable for mixed radix applications, especially for radixes that are powers of 2 and straight-
forward continuous flow implementation. The paper presents the technique and the resulting FFT architecture and shows
the advantages of the architecture compared to hitherto published results. The implementations on a Xilinx FPGA Virtex-7
VC707 of the in-place radix-8 FFT architectures with input sizes 64 and 512 complex points validate the results.
Keywords FFT · Parallel memory access · In-place architecture · FPGA implementation
1 Introduction
The evolving applications in the areas of signal processing
and telecommunications demand FFT computations per-
formed at high speed with minimal resources. FFT architec-
tures targeting low cost implementations include a radix-b
butterfly processor and a memory storing the N input points,
which by the use of in-place techniques stores also the
results of the FFT intermediate and output stages. Speeding
up the computations can be achieved by including b mem-
ory banks and an addressing scheme, which loads and stores
in parallel the b input and the b output data of each radix-b
D. Reisis
dreisis@phys.uoa.gr
V. Kitsakis
bkits@phys.uoa.gr
K. Nakos
knakos@phys.uoa.gr
N. Vlassopoulos
nvlassop@phys.uoa.gr
1
Department of Physics, Electronics Laboratory, National
and Kapodistrian University of Athens, Physics Building. IV,
Panepistimiopolis, 157-84 Athens, Greece
butterfly computation [1–4]. The architecture becomes more
efficient by minimizing the cost of the circuits generating
and routing the addresses of the data fetched in parallel and
also the cost of the circuits generating the related twiddles.
Parallelizing the load and store operations of the butterfly
data has been studied in [3–7, 9, 10, 12, 13]. The author
of [3] gave a solution for radix-b FFT computations,
which includes b memory banks and performs an initial
data distribution in the b banks with a complex address
generation circuit. The solution for radix-2 presented in [4]
uses output registers to resolve the conflict while storing
the results of the butterfly. Reisis and Vlassopoulos [7]
showed the set of permutations that provide a solution to the
parallel access in N points FFT computations with radix-b
and b banks and require log
b
2
N bit LUTs for realizing the
permutations. A technique based on the stride-permutation
is presented in [6] without proof and it requires complexity
for the address generation equal to that in [3]. Related work
proving that the stride permutation can be used to minimize
the number of the required adders in the address generation
for streaming applications, including the FFT, is presented
in [11]. Techniques for radix-2 FFTs are reported in [5, 8].
[5] shows a heuristic approach and [8] introduces a parallel
addressing scheme exploiting the Gray code properties. The
authors of [12] present an improved architecture in the case
of real value FFT.