Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, smcho, satrajit, brayton}@eecs.berkeley.edu Abstract An algorithm for technology mapping of combinational and sequential logic networks is proposed and applied to mapping into K-input lookup-tables (K-LUTs). The new algorithm avoids the hurdle of computing all K-input cuts while preserving the quality of the results, in terms of area and depth. The memory and runtime of the proposed algorithm are linear in circuit size and quite affordable even for large industrial designs. For example, computing a good quality 6-LUT mapping of an AIG with 1M nodes takes 150Mb of RAM and 1 minute on a typical laptop. An extension of the algorithm allows for sequential mapping, which searches the combined space of all possible mappings and retimings. This leads to an 18-22% improvement in depth with a 3-5% LUT count penalty, compared to combinational mapping followed by retiming. 1 Introduction Technology mapping transforms a technology-independent logic network, called the subject graph, into a network of logic nodes. For Field-Programmable Gate Arrays (FPGAs) each logic node is represented using a K-input look-up table (LUT) implementing any Boolean function up to K inputs. The subject graph is often represented as an AND-Inverter Graph (AIG) composed of two-input ANDs and inverters. Most structural methods of FPGA mapping [6][12] start by computing all, or nearly all, K-feasible cuts for each AIG node. Similar methods exist for standard cell mapping. The number of such cuts in a network with n nodes is O(n K ) [3]. Next, the AIG nodes are traversed in a topological order and a dynamic programming approach is used to find an optimum-depth LUT mapping of the AIG. This mapping is transformed by applying area-recovery heuristics [3][11][12], which reduce the number of logic nodes while preserving the depth of the LUT network. It should be noted that some structural FPGA mapping algorithms, e.g. FlowMap [2] and CutMap [4], do not compute all cuts. Instead, one good cut is found at each node using the maximum-flow algorithm, but this approach tends to have higher computational complexity and relatively poor area. As a result, a recent state-of-the-art mapper produced by that same research group [6] is based on cut enumeration rather than maximum flow. In a large class of programmable architectures, the LUT size, K, varies between 3 and 6. For these relatively small LUT sizes, the traditional methods for LUT mapping based on cut enumeration work well. For K equal to 4 or 5, exhaustive cut enumeration [14][5] can be applied, resulting in an average of 10-40 cuts stored at each node. When the LUT size is 6, exhaustive cut enumeration may lead to 100+ cuts per node. As a result, cut representation takes substantial memory when mapping large Boolean networks. To remedy this, a partial cut enumeration can be used to prune the cuts resulting in reduced memory requirements [5]. However, cut pruning may result in losing good cuts, so that depth-optimality of mapping is not guaranteed. Another class of modern programmable architectures realizes logic networks using macro-cells, which typically contain LUTs and other logic gates. A straight-forward way of mapping logic into programmable macro-cells starts by computing all K-input cuts for each node where K is the number of macro-cell inputs. Unlike a K-input LUT, a K-input macro-cell cannot implement all logic functions of K inputs. Therefore, the local function of each cut is computed as a function of the cut inputs, and only those cuts whose logic function can be expressed by the macro-cell are used for mapping. However, methods based on cut enumeration cannot be applied because a macro-cell often has 8 or more inputs, and the number of 8-input cuts is extremely large and can be computed only for the smallest benchmarks. This paper presents a new algorithm for high-quality mapping whose runtime and memory requirements are linear in the number of nodes in the subject graph. The proposed algorithm avoids exhaustive cut enumeration by computing only a small fixed number (typically, 5-10) of “good” K-feasible cuts at each node. These are called priority cuts. The criteria used to prioritize the cuts differ depending on the mapping goals. For example, when mapping for depth, the cuts are prioritized first by depth, then by the number of inputs, and finally by area. Experiments indicate that such prioritization gives a depth-optimum mapping for 95% of all benchmarks and LUT sizes, even if only one cut is stored at each node! Increasing the number of priority cuts to 8 allows the algorithm to avoid area penalty due to not enumerating all cuts, while still offering dramatic improvements in memory and runtime, compared to exhaustive cut enumeration. For 6-input LUTs, with 8 priority cuts stored, memory is reduced 10x and runtime 5x, compared to previous approaches, while depth and area are comparable or better. For 8-input and larger LUTs, the reduction in memory and runtime is about 50x. The proposed algorithm is extended to sequential mapping, which searches a combined space of combinational K-LUT mapping and retiming of the resulting LUT network. This integrated mapping leads to a 20% reduction in depth, compared to the combinational LUT mapping followed by retiming performed as a post-processing step. We emphasize that although this paper was written with FPGA mapping in mind and the experiments were done as such, cut- based mapping for standard cells, macro-cells, super-gates, etc. is similar. The use of priority cuts in this and other applications (e.g. rewriting) should be equally applicable and similar improvements can be expected. The rest of the paper is organized as follows. Section 2 describes some background. Section 3 reviews the traditional FPGA mapping algorithm. Section 4 describes the new algorithm. Section 5 presents the extension to sequential mapping. Section 6 reports experimental results. Section 7 concludes the paper and outlines future work. 1-4244-1382-6/07/$25.00 ©2007 IEEE 354