Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

Similar presentations


Presentation on theme: "Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley."— Presentation transcript:

1 Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley

2 2 Outline 1.Traditional cut-based LUT mapping 2.Improved technology mapping with priority cuts 3.Sequential mapping 4.Other applications of priority cuts 5.Experimental results

3 3 Technology Mapping Input: A Boolean network (And-Inverter Graph) Output: A netlist of K-LUTs implementing the Boolean network optimizing some cost function a b cd f The subject graphThe mapped netlist Technology Mapping e a b cd e f

4 4 k-feasible Cuts abc p q r A cut of a node n is a set of nodes in transitive fan-in such that every path from the node to PIs is blocked by nodes in the cut. A k-feasible cut means the size of the cut must be k or less. The set {p, b, c} is a 3-feasible cut of node r. (It is also a 4-feasible cut.) k-feasible cuts are important in FPGA mapping since the logic between a node and the nodes in its cut can be replaced by a k-LUT.

5 5 k-feasible Cut Computation a b c p q { {p}, {a, b} }{ {q}, {b, c} } { {a} } { {b} }{ {c} } r { {r}, {p, q}, {p, b, c}, {a, b, q}, {a, b, c} } The set of cuts of a node is a ‘cross product’ of the sets of cuts of its children Any cut that is of size greater than k is discarded Computation is done bottom-up (P. Pan et al, FPGA ’98; J. Cong et al, FPGA ’99)

6 6 Basic Mapping Algorithm Depth-optimal LUT mapping of a DAG using all cuts at each node Input: And-Inverter Graph 1.Compute K-feasible cuts for each node 2.Compute best arrival time at each node In topological order (from PI to PO) Compute the depth of all cuts and choose the best one 3.Perform area recovery Using area flow Using exact local area 4.Chose the best cover In reverse topological order (from PO to PI) Output: Mapped Netlist

7 7 Area Recovery Summary Area recovery heuristics –Area-flow (global view) Chooses cuts with better logic sharing –Exact local area (local view) Minimizes the number of LUTs needed to map each node The results of area recovery depends on –The order of processing nodes –The order of applying two passes –The number of iterations This scheme works for the constant-delay model –Any change off the critical path doesn’t affect critical path

8 8 Drawbacks of Traditional Mapping Based on Exhaustive Cut Enumeration For large designs, there may be many k-feasible cuts –Order of millions Previous ways of dealing with the problem –Detect and remove cut dominance –Perform cut pruning –Store only cuts on the frontier of mapping k Average number of cuts per node 46 520 680 7150 8240

9 9 Outline 1.Traditional cut-based technology mapping 2.Improved technology mapping 3.Sequential mapping 4.Other applications of priority cuts 5.Experimental results

10 10 New Mapping Algorithm Near-depth-optimal LUT mapping of a DAG using several cuts at each node Input: And-Inverter Graph 1.Compute K-feasible cuts for each node 2.Compute arrival time at each node In topological order (from PI to PO) Compute the depth of all cuts and choose the best one Compute at most C good cuts and choose the best one 3.Perform area recovery Using area flow Using exact local area Re-compute at most C good cuts and choose the best one in each iteration 4.Chose the best cover In reverse topological order (from PO to PI) Output: Mapped Netlist

11 11 Computing Priority Cuts Consider nodes in a topological order –At each node, merge two sets of fanin cuts (each containing C cuts) getting (C+1) * (C+1) + 1 cuts –Sort these cuts using a given cost function, select C best cuts, and use them for computing priority cuts of the fanouts –Select one best cut, and use it to map the node Sorting criteria

12 12 Discussion Complexity analysis –Traditional mapping algorithm FlowMap O(Kmn) (J. Cong et al, TCAD ’94) CutMap O(2Kmn  K  ) (J. Cong et al, FPGA ’95) –Proposed mapping algorithm O(KC 2 n) K - max cut size C - max number of cuts n - number of nodes m – number of edges

13 13 Priority Cuts: A Bag of Tricks  Compute and use priority cuts (a subset of all cuts)  Dynamically update the cuts in each mapping pass  Use different sorting criteria in each mapping pass  Include the best cut from the previous pass into the set of candidate cuts of the current pass  Consider several depth-oriented mappings to get a good starting point for area recovery  Use complementary heuristics for area recovery  Perform cut expansion as part of area recovery  Use efficient memory management

14 14 Outline 1.Traditional cut-based technology mapping 2.Improved technology mapping 3.Sequential mapping 4.Other applications of priority cuts 5.Experimental results

15 15 Sequential Mapping  That is, combinational mapping and retiming combined  Minimizes clock period in the combined solution space  Previous work:  Pan et al, FPGA ’ 98  Cong et al, TCAD ’ 98  Our contribution: divide sequential mapping into steps  Find the best clock period via sequential arrival time computation (Pan et al, FPGA ’ 98)  Run combinational mapping with the resulting arrival/required times of the register outputs/inputs  Perform final retiming to bring the circuit to the best clock period computed in Step 1

16 16 Sequential Mapping (continued) Advantages –Uses priority cuts (L=1) for computing sequential arrival times very fast –Reuses efficient area recovery available in combinational mapping almost no degradation in LUT count and register count –Greatly simplifies implementation due to not computing sequential cuts (cuts crossing register boundary) Quality of results –Leads to quality that is better (by ~15%) than combinational mapping followed by retiming due to searching the combined search space –Achieves almost the same (-1%) clock period as the general sequential mapping with sequential cuts due to using transparent register boundary without computing sequential cuts

17 17 Outline 1.Traditional cut-based technology mapping 2.Improved technology mapping 3.Sequential mapping 4.Other applications of priority cuts 5.Experimental results

18 18 Speeding Up SAT Solving Perform technology mapping into K-LUTs for area –Define area as the number of CNF clauses needed to represent the Boolean function of the cut –Run several iterations of area recovery Reduced the number of CNF clauses by ~50% –Compared to a smart circuit-to-CNF translation (M. Velev) Improves SAT solver runtime by 3-10x –Experimental results will be given later

19 19 Minimizing the Total Number of BDD Nodes Needed to Represent a Boolean Network Perform technology mapping into K-LUTs for minimizing area under delay constraints –Define area of a cut as the number of BDD nodes needed to represent the Boolean function of the cut –Run delay-oriented mapping, followed by several iterations of area recovery

20 20 Cut Sweeping Reduce the circuit by detecting and merging shallow equivalences (proposed by Niklas Een) –By “shallow” equivalences, we mean equivalent points, A and B, for which there exists a K-cut C (K < 16) such that F A (C) = F B (C) –A subset of “good” K-input priority cuts can be computed –The quality of a cut is determined by the number of fanouts of the cut leaves The more fanouts, the more likely the cut is a common cut for two nodes Cut sweeping quickly reduces the circuit –Typically ~50% gain of SAT sweeping (Fraiging) Cut sweeping is much faster than SAT sweeping –Typically 10-100x, for large designs Can be used as a fast preprocessing to (or a low-cost substitute for) SAT sweeping

21 21 Sequential Resynthesis for Delay Restructure logic along the tightest sequential loops to reduce delay after retiming (Soviani/Edwards, TCAD’07) –Similar to sequential mapping –Computes seq arrival times for the circuit –Uses the current logic structure, as well as logic structure, transformed using Shannon expansion w.r.t. the latest variables –Accepts transforms leading to delay reduction –In the end, retimes to the best clock period The improvement is 7-60% in delay with 1-12% area degradation (ISCAS circuits) This algorithm could benefit from the use of priority cuts

22 22 Outline 1.Traditional cut-based technology mapping 2.Improved technology mapping 3.Sequential mapping 4.Other applications of priority cuts 5.Experimental results

23 23 Experimental Comparison Compare the new mapping against the traditional mapping in terms of –Delay –Area –Runtime –Memory Compare on large industrial benchmarks with choices Analyze the performance of the new mapping for –Large designs –Large LUTs Explore the potential of sequential mapping Computer used for experiments –IBM ThinkPad laptop with 1.6GHz and 2Gb RAM

24 24 Priority cuts vs. Cut enumeration (C=8) Used a set of the large public benchmarks

25 25 Priority Cuts vs. Cut Enumeration (K=6, C = 16) Mapping w/o choices Mapping with choices Priority cuts Cut enumeration Used a set of large industrial benchmarks

26 26 Performance on Large Designs (C=1) Using design wb_conmax.v (part of IWLS 2005 benchmarks) This is a WISHBONE Interconnect Matrix IP core. It can interconnect up to 8 Masters and 16 Slaves Source: http://www.opencores.org

27 27 Performance for Large LUTs (C=1) Using 100 timeframes of design wb_conmax.v

28 28 Sequential Mapping (K=6, C=8) Used a subset of ISCAS benchmarks, for which retiming reduced delay

29 29 Summary Reviewed traditional technology mapping –Cut computation –Optimum-depth mapping –Area recovery Presented an improved approach to mapping –Computes a small number of cuts at each node –Uses new ideas to dramatically reduce memory and runtime Reported experimental results –Compared priority cuts with exhaustive cut enumeration Delay and area are comparable or better by 1-3% Memory and runtime are greatly reduced (5x for 6-LUTs) –Showed performance on very large designs (2 sec to map 1M) –Compared combinational and sequential mapping Implemented in ABC –Google: “abc berkeley” (package “if”)

30 30 The End


Download ppt "Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley."

Similar presentations


Ads by Google