On the Use of GP-GPUs for Accelerating Compute-intensive EDA Applications Valeria Bertacco, Debapriya Chatterjee EECS Department University of Michigan, USA {valeria|dchatt}@umich.edu Nicola Bombieri, Franco Fummi, Sara Vinco Dip. Informatica Universit` a di Verona, Italy {name.surname}@univr.it A. M. Kaushik, Hiren D. Patel ECE Department University of Waterloo, CA {amkaushi|hdpatel}@uwaterloo.ca Abstract—General purpose graphics processing units (GP- GPUs) have recently been explored as a new computing paradigm for accelerating compute-intensive EDA applications. Such mas- sively parallel architectures have been applied in accelerating the simulation of digital designs during several phases of their development – corresponding to different abstraction levels, speciﬁcally: (i) gate-level netlist descriptions, (ii) register-transfer level and (iii) transaction-level descriptions. This embedded tuto- rial presents a comprehensive analysis of the best results obtained by adopting GP-GPUs in all these EDA applications. I. I NTRODUCTION Simulation plays an important role in the validation of dig- ital hardware systems. It is heavily used to evaluate functional correctness, and to perform early design and performance tradeoffs. This entails a vast number of simulation runs to evaluate many design’s trade-offs and validate as many ex- ecution scenarios as possible throughout the development of the design. As a result, simulation is one of the most time, effort and resource-consuming activities of the entire design cycle. Correspondingly, the performance of simulation affects the time-to-market of many digital designs today. On the other hand, the continued increase in design’s functionality has resulted in simulation models that are larger and more complex than ever. This trend further burdens the performance of simulation, leading to much longer completion times for many simulation runs, affecting all stages of the development: from early-stage high-level model simulations, to the vast efforts dedicated to RTL and gate-level validation. Ultimately, this issue has been preventing design team from meeting today’s stringent time-to-market constraints [1]. As a result, there is considerable interest in developing techniques that expedite the simulation of large and complex digital hardware system models. Gate-level simulation takes place at late development stages, once the design has undergone the ﬁrst few synthesis iterations [2], [3], [4]. Its objective is that to evaluate the functional and electrical correctness of the netlist description of the system. Often, the reference model use to validate results of a gate-level simulation is either a high-level design model (such as a C or SystemC model) or an RTL speciﬁcation. The This work has been partially supported by EU project FP7-ICT-2011-7- 288166 (TOUCHMORE) and by NSF grant #1217764 major issue in gate-level simulation, is that it tackles a fairly detail description of the design, thus, usually, the correspond- ing model is extremely large. Consequently, the completion of these simulation, when even feasible, require many hours or days. Early effort to leverage concurrent processing resources to address this problem include dividing the processing of in- dividual events across multiple machines with ﬁne granularity. This ﬁne granularity would generate a high communication overhead and, depending on the solution, the issue of deadlock avoidance could require specialized event handling [5]. Parallel logic simulation algorithms were also proposed for distributed systems [6], [7] and multiprocessors [8] with some success. A widely used simulation environment for early design space exploration of digital hardware systems is SystemC [9]. SystemC is an open-source library of C++ classes that allows modelling at the register-transfer level (RTL) and transaction- level (TL) abstractions. SystemC is commonly deployed for early design trade-off evaluation and validation of high-level models. Even though SystemC simulation operates at a higher abstraction level than traditional RTL, SystemC simulation is known to suffer from long simulation times [1]. This aspect has motivated the research community to develop techniques to accelerate SystemC simulations. These techniques include model transformations [10], distributed simulation [11], pro- cess splitting [12], and static scheduling of processes [13]. Furthermore, the SystemC reference simulation kernel imple- ments a conventional discrete-event semantics in its single- threaded implementation. This architecture prevents the kernel from being easily portable to traditional high-performance multiprocessing platforms (such as SMP). Overcoming this challenge has been the topic of much research [14], [15], [16], [17], [18]. Recently, another form of concurrent architecture has be- come widely available: general purpose graphics processing units (GP-GPUs). GP-GPUs are massively parallel architec- tures that support data-level and thread-level parallelism. While they were originally designed for graphics and scientiﬁc com- puting, researchers have recently been exploring the use of GP-GPUs to accelerate digital design simulation at several levels, particularly those discuss above, namely gate-level descriptions [19], [20], [21], [22] and SystemC descriptions, both when used for RTL and transction-level models [23], [24], [25], [26], [27]. On the logic simulation front, the key challenge lies in the high amount of unstructured interconnections among the 978-3-9815370-0-0/DATE13/ c ⃝2013 EDAA