A Scalable Multi-FPGA Framework for Real-Time Digital Signal
Processing
K. M. Irick
1
, M. DeBole
2
, S. Park
3
, A. Al Maashri
4
, S. Kestur
5
, C-L. Yu
6
*, N. Vijaykrishnan
7
The Department of Computer Science and Engineering,
The Pennsylvania State University, University Park PA, 16802
*Arizona State University, Tempe AZ, 85287
ABSTRACT
FPGAs have emerged as the preferred platform for implementing real-time signal processing applications. In the sub-
45nm technologies, FPGAs offer significant cost and design-time advantages over application-specific custom chips and
consume significantly less power than general-purpose processors while maintaining, or improving performance.
Moreover, FPGAs are more advantageous than GPUs in their support for control-intensive applications, custom bit-
precision operations, and diverse system interface protocols. Nonetheless, a significant inhibitor to the widespread
adoption of FPGAs has been the expertise required to effectively realize functional designs that maximize application
performance. While there have been several academic and commercial efforts to improve the usability of FPGAs, they
have primarily focused on easing the tasks of an expert FPGA designer rather than increasing the usability offered to an
application developer. In this work, the design of a scalable algorithmic-level design framework for FPGAs,
AlgoFLEX, is described. AlgoFLEX offers rapid algorithmic level composition and exploration while maintaining the
performance realizable from a fully custom, albeit difficult and laborious, design effort. The framework masks aspects
of accelerator implementation, mapping, and communication while exposing appropriate algorithm tuning facilities to
developers and system integrators. The effectiveness of the AlgoFLEX framework is demonstrated by rapidly mapping
a class of image and signal processing applications to a multi-FPGA platform.
Keywords: FPGA design, Image processing
1. INTRODUCTION
As FPGAs have migrated to lower technology nodes, their regular fabrics have allowed them to improve their power-
efficiency and computational capabilities at a rate exceeding custom ASICs or even general purpose microprocessors.
While many of these benefits can be attributed to advances in technology, the ability for FPGAs to meet the performance
requirements of complex signal processing tasks has also been due to the inclusion of specialized components within the
FPGA fabric. These specialized components include multiply accumulate units, high speed transceivers, internal
memory, and embedded processors. Together these architectural enhancements allow FPGAs to provide higher
performance than a traditional “logic slice-only” implementation. For example, the newly announced Virtex-6 devices
from Xilinx include more than 1,000 embedded multiply accumulate units supporting more than 1 Tera Multiply
Accumulates (TMACs) per second providing unique computational capabilities for compute intensive signal processing
algorithms. However, FPGAs face stiff competition from Graphical Processing Units (GPUs) as the hardware platform
of choice in signal processing applications due to the ease at which GPUs can be programmed: an aspect in which
FPGAs typically suffer. Despite the lack of programmability, FPGAs hold several advantages over GPUs in that they
have the ability to be customized for the appropriate numeric precision, have the flexibility to communicate with diverse
1
irick@cse.psu.edu;
2
debole@cse.psu.edu; www.cse.psu.edu/people/debole
3
szp142@cse.psu.edu;
4
asa161@cse.psu.edu;
5
kesturvy@cse.psu.edu;
6
chiliyu@asu.edu
7
vijay@cse.psu.edu; www.cse.psu.edu/people/vijay
Mathematics for Signal and Information Processing, edited by Mark S. Schmalz, Gerhard X. Ritter,
Junior Barrera, Jaakko T. Astola, Franklin T. Luk, Proc. of SPIE Vol. 7444, 744416 · © 2009 SPIE
CCC code: 0277-786X/09/$18 · doi: 10.1117/12.834177
Proc. of SPIE Vol. 7444 744416-1
Downloaded from SPIE Digital Library on 10 Feb 2010 to 130.203.40.38. Terms of Use: http://spiedl.org/terms