Reducing Compilation Effort in Commercial FPGA Emulation Systems Using Machine Learning Anthony Agnesina 1 , Etienne Lepercq 2 , Jose Escobedo 2 , and Sung Kyu Lim 1 1 School of ECE, Georgia Institute of Technology, Atlanta, GA 2 Synopsys Inc., Mountain View, CA agnesina@gatech.edu Abstract—This paper presents a machine learning (ML) frame- work to improve the use of computing resources in the FPGA compilation step of a commercial FPGA-based logic emulation flow. Our ML models enable highly accurate predictability of the final P&R design qualities, runtime, and optimal mapping parameters. We identify key compilation features that may require aggressive compilation efforts using our ML models. Ex- periments based on our large-scale database from an industry’s emulation system show that our ML models help reduce the total number of jobs required for a given netlist by 33%. Moreover, our job scheduling algorithm based on our ML model reduces the overall time to completion of concurrent compilation runs by 24%. In addition, we propose a new method to compute “recommendations” from our ML model, in order to perform re- partitioning of difficult partitions. Tested on a large-scale industry SoC design, our recommendation flow provides additional 15% compile time savings for the entire SoC. I. I NTRODUCTION Modern System on Chip (SoC) designs are often larger and more complex than can be competitively tested under tradi- tional hardware/software co-validation methods. They require billions of cycles of execution, which takes too long to simu- late in software. Physical emulation using commercial FPGAs can overcome the time constraints of software emulation of an ASIC of up to a billion gates. To achieve successful mapping of large ASIC designs, an emulator integrates many hundreds of FPGAs. Commercial FPGAs can provide larger capacity and faster runtime perfor- mance (up to 5MHz) compared with custom FPGAs or special- purpose custom logic processor-based architectures. However, these FPGAs do not befit the very high pin-to-gate ratio requirements of logic emulation systems [1]. Therefore, they often suffer from a time-consuming Place and Route (P&R) step that can quickly become the most dominating part of the entire implementation time [2]. As a new compilation run of hundreds of FPGAs might be needed for each design update, a compile time of multiple hours each is crippling. The use of machine learning (ML) is already benefiting the semiconductor industry, with applications in formal ver- ification and physical design [3] (e.g. yield modeling and predicting congestion hotspots). Our research suggests that ML can as well expedite the time-consuming P&R physical emulation step for FPGAs. Recently, ML has been employed to improve wirelength, delay or power of FPGA P&R solutions This material is based upon work supported by the National Science Foundation under Grant No. CNS 16-24731 and the industry members of the Center for Advanced Electronics in Machine Learning. 6R& 57/ )3*$ 0DSSLQJ )3*$ 35 )3*$ 3URJUDPPLQJ IDLO VXFFHVV SDUWLWLRQ ; )3*$ 3DUWLWLRQLQJ Fig. 1: Our multi-FPGA emulation scheme with FPGA recompilation. using Design Space Exploration of CAD tool parameters [4], [5], [6]. In [7], the authors show it is possible to predict the best Quality-of-Results (QoR) placement flow among a reduced set of candidate flows. However, none of these studies focus on important issues related to compile time, nor have been employed to predict compilation success of very high utilization designs (e.g. up to 75% lookup table (LUT) usage). Indeed, the basis of their exploration targets small traditional benchmarks or small FPGAs, which is far from the reality of crowded and complex consumer designs found in SoC emulation. The key contributions of this paper are as follows: We build a complete ML data pipeline framework, allow- ing for the extraction of numerous predictors. Using these predictors and our large-scale commercial FPGA compilation database, we build models delivering high predictability of P&R design qualities, runtime, and optimal mapping parameters of complex designs. We show how—by predicting P&R compilation results— we effectively improve the compile time and hardware cost of the P&R step of the emulation process. Using our ML model, we demonstrate how our “design recommendations” improve the quality of the partition- ing, resulting in overall faster P&R steps. II. MACHINE LEARNING I NFRASTRUCTURE This work is intended to improve the compilation flow of multi-FPGA-based emulation systems, whose main steps are shown in Figure 1. A given SoC RTL is first translated into circuit representation. Next, the resulting netlist is partitioned across multiple FPGAs using a multilevel hierarchical ap- 978-1-7281-2350-9/19/$31.00 ©2019 IEEE