Implementing the Thread Programming Model on Hybrid FPGA/CPU Computational Components David Andrews, Douglas Niehaus, Razali Jidin Information Technology and Telecommunications Center Department of Electrical Engineering and Computer Science University of Kansas {dandrews,niehaus,rjidin}@ittc.ukans.edu} Abstract Designers of embedded systems are constantly challenged to provide new capabilities to meet expanding requirements and increased computational needs at ever improving price/performance ratios. Recently emerging hybrid chips containing both CPU's and FPGA components have the potential to enjoy significant economies of scale, while enabling system designers to include a significant amount of specialization within the FPGA component. However, realizing the promise of these new hybrid chips will require programming models supporting a far more integrated view of the CPU and FPGA components than provided by current methods. This paper describes fundamental synchronization methods we are now developing for supporting a multi-threaded programming model that provides a transparent interface to the CPU and FPGA based component threads. 1. Introduction Designers of embedded and real time systems are continually challenged to provide increased computational capabilities to meet tighter system requirements at ever improving price/performance ratios. Best practice methods have long promoted the use of commercial off the shelf (COTS) components to reduce design costs and time to market. Creating COTS components that can be reused in a wide range of real-time and embedded applications is a still a difficult challenge, in part, because it requires the simultaneous satisfaction of apparently contradictory design forces: generalization and specialization. Systems designers are all too familiar with the tension caused by these opposing forces in trying to balance cost versus performance. Recently emerging hybrid chips containing both CPU and FPGA components are an exciting new development that promise COTS economies of scale, while also supporting significant hardware customization. For example, Xilinx [4] offers the Virtex II Pro which combines up to four Power PC 405 cores with up to approximately 4 million free gates, while Altera [5] offers the Excalibur, which combines an ARM 922 core with approximately the same number of free gates. Designers now have the freedom to select a set of FPGA IP to create a specialized System-on-a-Chip (SoC) solution. These capabilities allow the designer to enjoy the economies of scale of a COTS device but based on a selected set of IP that produces a design tailored for their specific requirements. Additionally, the free FPGA gates may also be used to support customized application specific components for performance critical functions. While the performance of an FPGA based implementation is still lower that that of an equivalent ASIC, the FPGA based solution often provides acceptable performance but with a significantly better price/performance ratio. Tapping the full potential of these hybrid chips presents an interesting challenge for system developers. Specifying custom components within the FPGA requires knowledge of hardware design methods and tools, which dangles the full potential of these hybrids tantalizingly out of reach for the majority of system programmers. Researchers are seeking solutions to this barrier by investigating new design languages, hardware/software specification environments, and development tools. Projects such as Ptolemy [2], Rosetta [9], and System-C [6] are investigating system level specification capabilities that can drive software compilation and hardware synthesis. Other projects such as Streams-C [3] and Handel C [7] are focused on raising the level of abstraction at which FPGA s are programmed from one of gate-level parallelism to that of modified and augmented C syntax. System Verilog [8] and a newly evolving VHDL standard [10] are also now being designed to abstract away the distinction between the two sides of the traditional low level