Exploring Opportunities to Improve the Performance of a Reconfigurable Instruction Set Processor NIKOLAOS VASSILIADIS*, GEORGE THEODORIDIS AND SPIRIDON NIKOLAIDIS Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece *Corresponding author. E-mail: nivas@physics.auth.gr In this paper, we target at a Reconfigurable Instruction Set Processor (RISP), which tightly couples a coarse-grain Reconfigurable Functional Unit (RFU) to a RISC processor. Furthermore, the architecture is supported by a flexible development framework. By allowing the definition of alternate architectural parameters the framework can be used to explore the design space and fine-tune the architecture at design time. Initially, two architectural enhancements, namely partial predicated execution and virtual opcode are proposed and the extensions performed in the architecture and the framework to support them, are presented. To evaluate these issues kernels from the multimedia domain are considered and an exploration to derive an appropriate instance of the architecture is performed. The efficiency of the derived instance and the proposed enhancements are evaluated using an MPEG-2 encoder application. Keywords: Reconfigurable processors; development framework; virtual opcode; predicated execution 1 Introduction Modern applications, implemented in embedded systems, are characterized by a broad diversity of algorithms, rapid evolution of standards, and high-performance demands. To amortize cost over high production volumes, embedded systems must exhibit high levels of flexibility and adaptation to achieve fast time-to-market and reusability. An appealing option, broadly referred to as reconfigurable computing, is to couple a standard processor with Reconfigurable Hardware (RH) combining this way the advantages of both resources (DeHon and Wawrzynek 1999). The processor can serve as the bulk of the flexibility that can be used to implement any algorithm. In addition, the incorporation of the RH, features potentially infinite dynamic instruction set extensions offering the adaptation of the system to the targeted application. In this paper, we target at a previously proposed dynamic RISP architecture (Vassiliadis et al. 200a), which consists of a typical RISC processor extended by a tightly coupled coarse-grain RFU. The efficient integration of the RFU in the control unit and the datapath of the processor eliminate the communication overhead between them. To increase performance, the RFU executes Multiple-Input-Single-Output (MISO) clusters of primitive operations as reconfigurable instructions. This way, a number of data- independent operations can be executed in parallel by the RFU, increasing the Instruction Level Parallelism (ILP) of the processor without the need for a long instruction word. Moreover, the RFU is “floating” between the processor’s pipeline stages combining spatial and temporal computation. This gives the opportunity to execute MISOs with smaller latency and better utilization of the available hardware. For such hybrid architecture, the traditional framework required to explore the design space and develop an application to the target processor must be appropriately extended (Barat et al. 2002). Therefore, a development framework for the target RISP architecture is also considered (Vassiliadis et al. 2006). By allowing different values for various architectural parameters, the framework can be retargeted to different instances of the architecture. Thus, exploration of the design space and fine-tuning of the architecture during design time towards a targeted application is possible. The framework transparently