Prototyping Globally Asynchronous Locally Synchronous Circuits on Commercial Synchronous FPGAs Mehrdad Najibi Kamran Saleh Mohsen Naderi Hossein Pedram Mehdi Sedighi {najibi, k.saleh, naderi, pedram, msedighi}@ce.aut.ac.ir Computer Engineering Department, Amirkabir University of Technology (Tehran Polytechnic) 424 Hafez Ave, Tehran 15785, Iran Abstract This paper introduces a methodology for prototyping Globally Asynchronous Locally Synchronous (GALS) circuits on synchronous commercial FPGAs. A library of required elements for implementing GALS circuits is proposed and general design considerations to successfully implement a GALS circuit on FPGA are discussed. The library includes clock generators and arbiters, and different port controllers. Different implementations of these circuits and their advantages and disadvantages are explored. At the end we present a GALS Reed-Solomon decoder as a practical example. The results show that the GALS approach improves the performance of the circuit by 11% and reduces the power consumption by 18.7% to 19.6% considering different error rates. On the other hand, the area of the circuit is increased by 51% which is acceptable considering that a pure synchronous circuit including a central controller is decomposed to generate GALS system and 29% of this overhead belongs to distributing controller in different modules. Deploying better decomposition methods can reduce this overhead substantially. 1. Introduction The new SoC designs face the challenge of distributing a high-speed low-skew clock in a large die. Proper clock distribution needs numerous buffers and a carefully designed clock tree which introduces a considerable area and power overhead. In a high-performance CPU, near 40% of the total power consumption of the circuit is consumed by the clock [1]. Asynchronous design methodologies can eliminate such overheads naturally by removing the clock signal from the design. However, these circuits are far from being a widely accepted solution yet due to the lack of reliable design tools for asynchronous circuits. The Globally Asynchronous Locally Synchronous (GALS) [2] have emerged to solve clock distribution problem and offer low power and low EMC circuits while allowing synchronous design benefits. The design of a GALS system generally starts with partitioning a fully designed and tested synchronous circuit. Availability of a large number of verified synchronous IP blocks is a one of the motivations behind this approach. The synchronous circuit should be partitioned into independent locally synchronous islands (LSI). This process includes partitioning the data path, and decomposing the central controller to generate autonomous synchronous islands. While LSIs perform their internal tasks synchronously, they must be equipped with asynchronous wrappers to be able to participate in a globally asynchronous communication. Figure 1. General GALS module Figure 1 shows an LSI with its corresponding asynchronous wrapper. The wrapper contains a local clock generator and asynchronous input/output controllers to communicate with other GALS or even fully asynchronous modules. One of the most challenging issues in GALS circuit design is the GALS library and the methodology to build asynchronous appearance of synchronous islands. A detailed discussion of a promising solution for this challenge will be provided in subsequent sections of this paper. On the other hand, FPGAs are widely recognized as an appropriate means for rapid prototyping. However, all commercially available FPGAs are designed to accommodate purely synchronous circuits. Even though considerable efforts are being made towards the design of a dual-technology synchronous/asynchronous FPGA [3] which allows GALS circuits functional validation, an economically viable version