J Sign Process Syst (2011) 65:245–259 DOI 10.1007/s11265-011-0606-x Design Methodology for Offloading Software Executions to FPGA Tomasz Patyk · Perttu Salmela · Teemu Pitkänen · Pekka Jääskeläinen · Jarmo Takala Received: 29 January 2011 / Revised: 4 July 2011 / Accepted: 4 July 2011 / Published online: 30 July 2011 © Springer Science+Business Media, LLC 2011 Abstract Field programmable gate array (FPGA) is a flexible solution for offloading part of the computa- tions from a processor. In particular, it can be used to accelerate an execution of a computationally heavy part of the software application, e.g., in DSP, where small kernels are repeated often. Since an application code for a processor is a software, a design method- ology is needed to convert the code into a hardware implementation, applicable to the FPGA. In this paper, we propose a design method, which uses the Transport Triggered Architecture (TTA) processor template and the TTA-based Co-design Environment toolset to au- tomate the design process. With software as a start- ing point, we generate a RTL implementation of an application-specific TTA processor together with the hardware/software interfaces required to offload com- This work has been supported by the Academy of Finland under research grant decision 128126. T. Patyk (B ) · P. Salmela · T. Pitkänen · P. Jääskeläinen · J. Takala Department of Computer Systems, Tampere University of Technology, P. O. Box 553, 33101, Tampere, Finland e-mail: tomasz.patyk@tut.fi P. Salmela e-mail: perttu.salmela@gmail.com T. Pitkänen e-mail: teemu.pitkanen@tut.fi P. Jääskeläinen e-mail: pekka.jaaskelainen@tut.fi J. Takala e-mail: jarmo.takala@tut.fi putations from the system main processor. To exem- plify how the integration of the customized TTA with a new platform could look like, we describe a process of developing required interfaces from a scratch. Finally, we present how to take advantage of the scalability of the TTA processor to target platform and application- specific requirements. Keywords Application-specific integrated circuits · Hardware accelerator · Computer aided engineering · System-on-a-chip · Coprocessors · Field programmable gate arrays 1 Introduction The growing complexity of software applications run- ning on the portable devices like mobile phones, smart phones, PDAs etc., call for the increase in the process- ing power offered by their CPUs. Typically, a RISC processor employed as a general purpose processing unit does not provide enough computational resources and the use of a specialized hardware accelerator is inevitable. A DSP co-processor is a common solution to speed up multimedia applications. Nevertheless how powerful the DSP processor is, a dedicated hardware will do the same task faster, consume less power, and take smaller silicon area. Reconfigurable hardware in form of field program- mable gate array (FPGA) makes an excellent solution for increasing the performance of an embedded system, as part of the application code can be offloaded from the processor. The performance increase requires care- ful planning though. Quite often the overhead of such arrangements, e.g., cost of data transfers between a