Reducing Compilation Effort in Commercial FPGA
Emulation Systems Using Machine Learning
Anthony Agnesina
1
, Etienne Lepercq
2
, Jose Escobedo
2
, and Sung Kyu Lim
1
1
School of ECE, Georgia Institute of Technology, Atlanta, GA
2
Synopsys Inc., Mountain View, CA
agnesina@gatech.edu
Abstract—This paper presents a machine learning (ML) frame-
work to improve the use of computing resources in the FPGA
compilation step of a commercial FPGA-based logic emulation
flow. Our ML models enable highly accurate predictability of
the final P&R design qualities, runtime, and optimal mapping
parameters. We identify key compilation features that may
require aggressive compilation efforts using our ML models. Ex-
periments based on our large-scale database from an industry’s
emulation system show that our ML models help reduce the total
number of jobs required for a given netlist by 33%. Moreover,
our job scheduling algorithm based on our ML model reduces
the overall time to completion of concurrent compilation runs
by 24%. In addition, we propose a new method to compute
“recommendations” from our ML model, in order to perform re-
partitioning of difficult partitions. Tested on a large-scale industry
SoC design, our recommendation flow provides additional 15%
compile time savings for the entire SoC.
I. I NTRODUCTION
Modern System on Chip (SoC) designs are often larger and
more complex than can be competitively tested under tradi-
tional hardware/software co-validation methods. They require
billions of cycles of execution, which takes too long to simu-
late in software. Physical emulation using commercial FPGAs
can overcome the time constraints of software emulation of
an ASIC of up to a billion gates.
To achieve successful mapping of large ASIC designs, an
emulator integrates many hundreds of FPGAs. Commercial
FPGAs can provide larger capacity and faster runtime perfor-
mance (up to 5MHz) compared with custom FPGAs or special-
purpose custom logic processor-based architectures. However,
these FPGAs do not befit the very high pin-to-gate ratio
requirements of logic emulation systems [1]. Therefore, they
often suffer from a time-consuming Place and Route (P&R)
step that can quickly become the most dominating part of the
entire implementation time [2]. As a new compilation run of
hundreds of FPGAs might be needed for each design update,
a compile time of multiple hours each is crippling.
The use of machine learning (ML) is already benefiting
the semiconductor industry, with applications in formal ver-
ification and physical design [3] (e.g. yield modeling and
predicting congestion hotspots). Our research suggests that
ML can as well expedite the time-consuming P&R physical
emulation step for FPGAs. Recently, ML has been employed
to improve wirelength, delay or power of FPGA P&R solutions
This material is based upon work supported by the National Science
Foundation under Grant No. CNS 16-24731 and the industry members of
the Center for Advanced Electronics in Machine Learning.
6R& 57/
)3*$ 0DSSLQJ
)3*$ 35
)3*$ 3URJUDPPLQJ
IDLO
VXFFHVV
SDUWLWLRQ ;
)3*$ 3DUWLWLRQLQJ
Fig. 1: Our multi-FPGA emulation scheme with FPGA recompilation.
using Design Space Exploration of CAD tool parameters [4],
[5], [6]. In [7], the authors show it is possible to predict
the best Quality-of-Results (QoR) placement flow among a
reduced set of candidate flows. However, none of these studies
focus on important issues related to compile time, nor have
been employed to predict compilation success of very high
utilization designs (e.g. up to 75% lookup table (LUT) usage).
Indeed, the basis of their exploration targets small traditional
benchmarks or small FPGAs, which is far from the reality
of crowded and complex consumer designs found in SoC
emulation. The key contributions of this paper are as follows:
• We build a complete ML data pipeline framework, allow-
ing for the extraction of numerous predictors.
• Using these predictors and our large-scale commercial
FPGA compilation database, we build models delivering
high predictability of P&R design qualities, runtime, and
optimal mapping parameters of complex designs.
• We show how—by predicting P&R compilation results—
we effectively improve the compile time and hardware
cost of the P&R step of the emulation process.
• Using our ML model, we demonstrate how our “design
recommendations” improve the quality of the partition-
ing, resulting in overall faster P&R steps.
II. MACHINE LEARNING I NFRASTRUCTURE
This work is intended to improve the compilation flow of
multi-FPGA-based emulation systems, whose main steps are
shown in Figure 1. A given SoC RTL is first translated into
circuit representation. Next, the resulting netlist is partitioned
across multiple FPGAs using a multilevel hierarchical ap-
978-1-7281-2350-9/19/$31.00 ©2019 IEEE