Combinational and Sequential Mapping with Priority Cuts
Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton
Department of EECS, University of California, Berkeley
{alanmi, smcho, satrajit, brayton}@eecs.berkeley.edu
Abstract
An algorithm for technology mapping of combinational and
sequential logic networks is proposed and applied to mapping
into K-input lookup-tables (K-LUTs). The new algorithm avoids
the hurdle of computing all K-input cuts while preserving the
quality of the results, in terms of area and depth. The memory and
runtime of the proposed algorithm are linear in circuit size and
quite affordable even for large industrial designs. For example,
computing a good quality 6-LUT mapping of an AIG with 1M
nodes takes 150Mb of RAM and 1 minute on a typical laptop. An
extension of the algorithm allows for sequential mapping, which
searches the combined space of all possible mappings and
retimings. This leads to an 18-22% improvement in depth with a
3-5% LUT count penalty, compared to combinational mapping
followed by retiming.
1 Introduction
Technology mapping transforms a technology-independent
logic network, called the subject graph, into a network of logic
nodes. For Field-Programmable Gate Arrays (FPGAs) each logic
node is represented using a K-input look-up table (LUT)
implementing any Boolean function up to K inputs. The subject
graph is often represented as an AND-Inverter Graph (AIG)
composed of two-input ANDs and inverters.
Most structural methods of FPGA mapping [6][12] start by
computing all, or nearly all, K-feasible cuts for each AIG node.
Similar methods exist for standard cell mapping. The number of
such cuts in a network with n nodes is O(n
K
) [3]. Next, the AIG
nodes are traversed in a topological order and a dynamic
programming approach is used to find an optimum-depth LUT
mapping of the AIG. This mapping is transformed by applying
area-recovery heuristics [3][11][12], which reduce the number of
logic nodes while preserving the depth of the LUT network.
It should be noted that some structural FPGA mapping
algorithms, e.g. FlowMap [2] and CutMap [4], do not compute all
cuts. Instead, one good cut is found at each node using the
maximum-flow algorithm, but this approach tends to have higher
computational complexity and relatively poor area. As a result, a
recent state-of-the-art mapper produced by that same research
group [6] is based on cut enumeration rather than maximum flow.
In a large class of programmable architectures, the LUT size, K,
varies between 3 and 6. For these relatively small LUT sizes, the
traditional methods for LUT mapping based on cut enumeration
work well. For K equal to 4 or 5, exhaustive cut enumeration
[14][5] can be applied, resulting in an average of 10-40 cuts stored
at each node. When the LUT size is 6, exhaustive cut enumeration
may lead to 100+ cuts per node. As a result, cut representation
takes substantial memory when mapping large Boolean networks.
To remedy this, a partial cut enumeration can be used to prune the
cuts resulting in reduced memory requirements [5]. However, cut
pruning may result in losing good cuts, so that depth-optimality of
mapping is not guaranteed.
Another class of modern programmable architectures realizes
logic networks using macro-cells, which typically contain LUTs
and other logic gates. A straight-forward way of mapping logic
into programmable macro-cells starts by computing all K-input
cuts for each node where K is the number of macro-cell inputs.
Unlike a K-input LUT, a K-input macro-cell cannot implement all
logic functions of K inputs. Therefore, the local function of each
cut is computed as a function of the cut inputs, and only those cuts
whose logic function can be expressed by the macro-cell are used
for mapping. However, methods based on cut enumeration cannot
be applied because a macro-cell often has 8 or more inputs, and
the number of 8-input cuts is extremely large and can be
computed only for the smallest benchmarks.
This paper presents a new algorithm for high-quality mapping
whose runtime and memory requirements are linear in the number
of nodes in the subject graph. The proposed algorithm avoids
exhaustive cut enumeration by computing only a small fixed
number (typically, 5-10) of “good” K-feasible cuts at each node.
These are called priority cuts. The criteria used to prioritize the
cuts differ depending on the mapping goals. For example, when
mapping for depth, the cuts are prioritized first by depth, then by
the number of inputs, and finally by area. Experiments indicate
that such prioritization gives a depth-optimum mapping for 95%
of all benchmarks and LUT sizes, even if only one cut is stored at
each node! Increasing the number of priority cuts to 8 allows the
algorithm to avoid area penalty due to not enumerating all cuts,
while still offering dramatic improvements in memory and
runtime, compared to exhaustive cut enumeration.
For 6-input LUTs, with 8 priority cuts stored, memory is
reduced 10x and runtime 5x, compared to previous approaches,
while depth and area are comparable or better. For 8-input and
larger LUTs, the reduction in memory and runtime is about 50x.
The proposed algorithm is extended to sequential mapping,
which searches a combined space of combinational K-LUT
mapping and retiming of the resulting LUT network. This
integrated mapping leads to a 20% reduction in depth, compared
to the combinational LUT mapping followed by retiming
performed as a post-processing step.
We emphasize that although this paper was written with FPGA
mapping in mind and the experiments were done as such, cut-
based mapping for standard cells, macro-cells, super-gates, etc. is
similar. The use of priority cuts in this and other applications (e.g.
rewriting) should be equally applicable and similar improvements
can be expected.
The rest of the paper is organized as follows. Section 2
describes some background. Section 3 reviews the traditional
FPGA mapping algorithm. Section 4 describes the new algorithm.
Section 5 presents the extension to sequential mapping. Section 6
reports experimental results. Section 7 concludes the paper and
outlines future work.
1-4244-1382-6/07/$25.00 ©2007 IEEE 354