A Scalable Multi-FPGA Framework for Real-Time Digital Signal Processing K. M. Irick 1 , M. DeBole 2 , S. Park 3 , A. Al Maashri 4 , S. Kestur 5 , C-L. Yu 6 *, N. Vijaykrishnan 7 The Department of Computer Science and Engineering, The Pennsylvania State University, University Park PA, 16802 *Arizona State University, Tempe AZ, 85287 ABSTRACT FPGAs have emerged as the preferred platform for implementing real-time signal processing applications. In the sub- 45nm technologies, FPGAs offer significant cost and design-time advantages over application-specific custom chips and consume significantly less power than general-purpose processors while maintaining, or improving performance. Moreover, FPGAs are more advantageous than GPUs in their support for control-intensive applications, custom bit- precision operations, and diverse system interface protocols. Nonetheless, a significant inhibitor to the widespread adoption of FPGAs has been the expertise required to effectively realize functional designs that maximize application performance. While there have been several academic and commercial efforts to improve the usability of FPGAs, they have primarily focused on easing the tasks of an expert FPGA designer rather than increasing the usability offered to an application developer. In this work, the design of a scalable algorithmic-level design framework for FPGAs, AlgoFLEX, is described. AlgoFLEX offers rapid algorithmic level composition and exploration while maintaining the performance realizable from a fully custom, albeit difficult and laborious, design effort. The framework masks aspects of accelerator implementation, mapping, and communication while exposing appropriate algorithm tuning facilities to developers and system integrators. The effectiveness of the AlgoFLEX framework is demonstrated by rapidly mapping a class of image and signal processing applications to a multi-FPGA platform. Keywords: FPGA design, Image processing 1. INTRODUCTION As FPGAs have migrated to lower technology nodes, their regular fabrics have allowed them to improve their power- efficiency and computational capabilities at a rate exceeding custom ASICs or even general purpose microprocessors. While many of these benefits can be attributed to advances in technology, the ability for FPGAs to meet the performance requirements of complex signal processing tasks has also been due to the inclusion of specialized components within the FPGA fabric. These specialized components include multiply accumulate units, high speed transceivers, internal memory, and embedded processors. Together these architectural enhancements allow FPGAs to provide higher performance than a traditional “logic slice-only” implementation. For example, the newly announced Virtex-6 devices from Xilinx include more than 1,000 embedded multiply accumulate units supporting more than 1 Tera Multiply Accumulates (TMACs) per second providing unique computational capabilities for compute intensive signal processing algorithms. However, FPGAs face stiff competition from Graphical Processing Units (GPUs) as the hardware platform of choice in signal processing applications due to the ease at which GPUs can be programmed: an aspect in which FPGAs typically suffer. Despite the lack of programmability, FPGAs hold several advantages over GPUs in that they have the ability to be customized for the appropriate numeric precision, have the flexibility to communicate with diverse 1 irick@cse.psu.edu; 2 debole@cse.psu.edu; www.cse.psu.edu/people/debole 3 szp142@cse.psu.edu; 4 asa161@cse.psu.edu; 5 kesturvy@cse.psu.edu; 6 chiliyu@asu.edu 7 vijay@cse.psu.edu; www.cse.psu.edu/people/vijay Mathematics for Signal and Information Processing, edited by Mark S. Schmalz, Gerhard X. Ritter, Junior Barrera, Jaakko T. Astola, Franklin T. Luk, Proc. of SPIE Vol. 7444, 744416 · © 2009 SPIE CCC code: 0277-786X/09/$18 · doi: 10.1117/12.834177 Proc. of SPIE Vol. 7444 744416-1 Downloaded from SPIE Digital Library on 10 Feb 2010 to 130.203.40.38. Terms of Use: http://spiedl.org/terms