Invited Paper: GPlace - A Congestion-aware Placement tool for UltraScale FPGAs Ryan Pattison, Ziad Abuowaimer, Shawki Areibi, *Gary Gréwal, Anthony Vannelli School of Engineering, *School of Computer Science University of Guelph Guelph, ON, Canada {rpattiso,abuowaiz,sareibi,ggrewal,vannelli}@uoguelph.ca ABSTRACT Traditional FPGA flows that wait until the routing stage to tackle congestion are quickly becoming less effective. This is due to the increasing size and complexity of FPGA architectures and the de- signs targeted for them. In this paper, we present two new congestion- aware placement tools for Xilinx UltraScale architectures, called GPlace-pack and GPlace-flat, respectively. The former placer par- ticipated in the ISPD 2016 Routability-driven Placement Contest for FPGAs, and finished in third place overall. The latter placer was subseqently developed based on our experience in the contest with GPlace-pack. Results obtained indicate that GPlace-flat is on average 5.3× faster than GPlace-pack. The post routing results show that GPlace-flat is able to obtain a further 22.5% improve- ment in wirelength and a 40.0% improvement in runtime compared to GPlace-pack. Keywords Placement, Field Programmable Gate Array, Congestion, Routing- aware, Heterogeneous, UltraScale Architecture 1. INTRODUCTION In this paper, we share our experience developing two analytic placement flows for performing FPGA placement: GPlace-pack and GPlace-flat. The former flow participated in the ISPD 2016 Routability-Driven FPGA Placement Contest [16], where it fin- ished third out of nineteen participating institutions. The latter flow was developed post-contest, and outperforms GPlace-pack in terms of quality-of-result, runtime, and total number of successfully rout- able contest benchmarks. As the contest title suggests, the goal of this year’s challenge was a departure from earlier years, where the focus was on solving prob- lems relevant to ASIC design. We chose to enter this year’s contest because we had earlier success developing both serial [14] and par- allel [8] analytic placement algorithms for homogeneous FPGAs, and wanted to better understand the challenging issues that arise when targeting modern, heterogeneous FPGA architectures, like the Xilinx UltraScale devices. ICCAD ’16 Our development team consisted of the five authors on this paper. Development of GPlace-pack began shortly after November 30, 2015, when the first sample benchmark was released by the con- test organizers, and was completed on April 6, 2016, when the con- test officially completed. The four contest benchmarks provided to contestants all had very different features leading to a very wide de- sign space containing different placement strategies and configura- tions. Especially challenging was finding ways to satisfy the many hard constraints that arise from the complex, heterogeneous archi- tecture present in the UltraScale FPGA device. Satisfying these constraints during placement, however, does not guarantee that the subsequent routing phase will complete successfully. This is be- cause congested regions in the placement may exhaust the routing resources available in those regions causing the router to fail. Our first approach to addressing these issues was to develop GPlace- pack, a placer which seeks to satisfy hard constraints through judi- cious packing of logic into slices. We found, however, that this approach leads to poor wirelength or solutions that do not fit on the FPGA. Therefore, following the contest we developed GPlace-flat, which solves congestion and legalization constraints during global placement, leading to better wirelength and lower congestion. The main contributions of this paper can be summarized as follows: 1. Two novel congestion-aware placement tools for Xilinx’s Ul- traScale architectures are presented. 2. A fast method for estimating congestion that is independent of the placement quality is implemented, making its use early in the placement process feasible. 3. A novel bi-partitioning procedure for legalizing flat place- ments is presented. The procedure satisfies all hard con- straints imposed by the UltraScale architecture, while min- imzing cell displacement. This procedure eliminates the need for a packing step in the FPGA flow, and allows the placer to optimize globally while enforcing legalization constraints. The remainder of this paper is organized into the following sec- tions. Section 2 provides an overview of the FPGA placement prob- lem. Section 3 presents related work. In Section 4, the GPlace al- gorithm is presented with two different implementations. Section 5 reviews the results of placing the contest benchmarks. Finally, the conclusions are presented in Section 6.