An Evolutionary Approach to Area-Time Optimization of FPGA designs Fabrizio Ferrandi, Pier Luca Lanzi, Gianluca Palermo, Christian Pilato, Donatella Sciuto, Antonino Tumeo Politecnico di Milano Dipartimento di Elettronica e Informazione Via Ponzio 34/5, Milano, Italy {ferrandi,gpalermo,lanzi,sciuto,tumeo}@elet.polimi.it chrpilat@tin.it Abstract—This paper presents a new methodology based on evolutionary multi-objective optimization (EMO) to synthesize multiple complex modules on programmable devices (FPGAs). It starts from a behavioral description written in a common high-level language (for instance C) to automatically produce the register-transfer level (RTL) design in a hardware description language (e.g. Verilog). Since all high-level synthesis problems (scheduling, allocation and binding) are notoriously NP-complete and interdependent, the three problems should be considered simultaneously. This drives to a wide design space, that needs to be thoroughly explored to obtain solutions able to satisfy the design constraints. Evolutionary algorithms are good candidates to tackle such complex explorations. In this paper we provide a solution based on the Non-dominated Sorting Genetic Algorithm (NSGA-II) to explore the design space in order obtain the best solutions in terms of performance given the area constraints of a target FPGA device. Moreover, it has been integrated a good cost estimation model to guarantee the quality of the solutions found without requiring a complete synthesis for the validation of each generation, an impractical and time consuming operation. We show on the JPEG case study that the proposed approach provides good results in terms of trade-off between total area occupied and execution time. I. I NTRODUCTION High-Level Synthesis (HLS) is concerned with the design and implementation of digital circuits starting from a behav- ioral description, subject to a set of goals and constraints, and given a library of different types of resources. The behavioral description specifies behavior in terms of operations, assign- ment statements and control constructs in a common high-level language (e.g. C language). The resource library provides a choice of components among which the synthesizer may select the one that best matches the design constraints and maximizes the optimization objectives. The overall target architecture of the HLS flow is typically based on the FSMD model [1]: a datapath description controlled by a finite state machine. At the RTL level, a datapath is composed of functional units, storage and interconnection elements. The finite state machine specifies every set of micro-operations for the datapath to be performed during each control step. High-level synthesis involves three main tasks: the operation scheduling, the resource allocation and the controller synthe- sis. Operation scheduling provides the cycle steps in which op- erations start their execution. Resource allocation is concerned TABLE I CHARACTERIZATION OF ADDER, MULTIPLIER AND MUXS [3] Functional units and MUX Implementation Area (CLB) Delay (ns) Power (W) adder24bit cla Carry look-head 26 11.8 0.010 mul18bit wall Booth-recoded Wallace 280 14.8 0.308 mux24bit 2to1 Synopsys design 6 0.6 0.002 mux24bit 8to1 Synopsys design 66 4.6 0.023 mux24bit 32to1 Synopsys design 276 10.9 0.240 with assigning operations and values to hardware components and interconnecting them using connection elements. Solving these problems efficiently is a non-trivial matter because of their NP-complete nature [2]. Controller synthesis provides the logic to issue datapath operations, based on the control flow. Recent studies [3] have demonstrated that interconnection costs have to be taken into account since area of multiplexers and interconnection elements has by far outweighed area of functional units and registers (see Table I). This is especially true for FPGA designs because a larger amount of transistors have to be provided in the wiring channels and logic blocks to provide programmability for signal transmission. This strongly motivates the design of highly effective algorithms to reduce the amount and size of multiplexers generated during high- level synthesis: a methodology that doesn’t consider them produces an incomplete area estimation. This could lead to a wrong final design, where interconnection elements could increase area costs also over global constraints. In fact, some- times design with more functional units or registers can reduce total area, by consistently reducing interconnection elements. As a result, interconnection allocation should be taken into account by each methodology that tries to minimize FPGA design. Evolutionary algorithms are good candidates for high-level synthesis because they iteratively improve a set of solutions (thus improving alternative designs), they don’t require the quality (cost) function to be linear (e.g.: time-area product) and they are known to work well on problems with large and non-