Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,

Similar presentations


Presentation on theme: "1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,"— Presentation transcript:

1 1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko, UC Berkeley

2 2 Outline 1. Motivation and algorithm overview 2. Review of area recovery 3. Algorithm details 4. Results and summary

3 3 Motivation  Generic: Cut-based mapping algorithms do well in minimizing logic level and area (LUT count) Cut-based mapping algorithms do well in minimizing logic level and area (LUT count) Could we change cut-based mapping to improve netlist for packing, placement, and routing? Could we change cut-based mapping to improve netlist for packing, placement, and routing?  Specific 1 Fewer pin-to-pin connections should make the design easier to place and route Fewer pin-to-pin connections should make the design easier to place and route Could we come up with a mapping algorithm to minimize the total # of connections in a design? Could we come up with a mapping algorithm to minimize the total # of connections in a design?  Specific 2 Newer FPGAs allow two outputs per LUT Newer FPGAs allow two outputs per LUT Could we produce a mapping that “pack” better into these dual-output LUTs? Could we produce a mapping that “pack” better into these dual-output LUTs?

4 4 Area Recovery Overview 1. Perform delay-optimal mapping first Not all paths are critical Not all paths are critical 2. Perform area recovery on non-critical paths Consider all nodes with positive slack Consider all nodes with positive slack For each node, look for a different cut reducing area For each node, look for a different cut reducing area Area recovery heuristics Area-flow (global view) Chooses cuts with better logic sharing Exact local area (local view) Minimizes the number of LUTs by looking one node at a time Both are important Both are important

5 5 Edge Recovery Overview  Find a simple-to-compute metric to minimize edge count and create smaller LUTs Definition Definition Edge = pin-to-pin connection between LUTsEdge = pin-to-pin connection between LUTs  Cut-based area recovery algorithms can be extended to minimize edges!

6 6 Edge Flow Cost Functions  Edge flow phase Use edge flow to minimize global edge count Use edge flow to minimize global edge count  Exact local edge phase Exactly minimize edge count within MFFCs Exactly minimize edge count within MFFCs

7 7 WireMap Algorithm Input: And-Inverter Graph 1. Compute K-feasible cuts for each node 2. Compute best arrival time at each node  In topological order (from PI to PO)  Compute the depth of all cuts and choose the best one 3. Perform area and edge recovery  Using area flow and edge flow  Using exact local area and exact local edge 4. Choose the best cover Output: Mapped Netlist

8 8 Algorithm – Edge Flow 1. Do delay-optimal mapping 2. Compute slack at each node 3. Do area recovery with area-flow Visit nodes in topological order from PI to PO Visit nodes in topological order from PI to PO Choose cuts, which do not exceed slack budget and have smallest area-flow Choose cuts, which do not exceed slack budget and have smallest area-flow If two cuts have the same area-flow, then choose the cut with the lower edge-flow If two cuts have the same area-flow, then choose the cut with the lower edge-flow

9 9 Algorithm - Exact Local Edges 1. After optimization with area flow + edge flow o described on the previous page 2. Do edge recovery with exact edges o Visit nodes in topological order from PI to PO o Among all cuts within slack budget, choose cut with smallest area, and to break ties choose cuts with lower number of edges Note: Unlike edge-flow, no estimation is involved

10 10 Experimental Setup  Implemented WireMap in ABC  Compared WireMap against two algorithms in ABC Baseline – basic mapping with area recovery Baseline – basic mapping with area recovery Mapping with Structure Choices (MSC) – mapping with area recovery for several netlists produced by synthesis Mapping with Structure Choices (MSC) – mapping with area recovery for several netlists produced by synthesis  WireMap was implemented on top of MSC  Used VPR to place/route design for wirelength and critical path delays Single LUT cluster, single length wire segment model Single LUT cluster, single length wire segment model  Used SIS to pack single-output LUTs into dual- output LUTs using maximum cardinality matching

11 11 Results Summary  MSC is superior to baseline mapping Single-output LUT count reduced by 9.1% Single-output LUT count reduced by 9.1% Edge count reduced by 8.1% Edge count reduced by 8.1% Dual-output LUT count reduced by 7.7% Dual-output LUT count reduced by 7.7%  WireMap leads to further reduction in edges by 9.3% and dual-output LUT count by 9.4% versus MSC Single-output LUT count only reduced by 1.3% wrt. MSC Single-output LUT count only reduced by 1.3% wrt. MSC  WireMap reduction of edges and dual-output LUTs is not directly related to single-output LUT reduction

12 12 Comparison of Area Recovery and Area/Edge Recovery Flow Mapping (K = 6) Area recoveryArea/Edge recoveryArea recovery WireMap leads to dual-output LUT count reduction by 9.4% WireMap leads to further reduction in edges by 9.3%

13 13 Wirelength, Channel Width, and Critical Path Delay Comparison twl = total wire length, mcw = minimum channel width required to route in VPR, cpd = critical path delay with min channel width across the three implementations Wirelength was reduced by 8.5% vs. MSC Minimum channel width reduced by 6% Critical path delay reduced by 2.3% Area recoveryArea/Edge recoveryArea recovery

14 14 WireMap Results – LUT Packing The histogram shows how the single-output LUT size distribution is affected, leading to a 9.4% reduction in dual output LUT6s Reduced Increased LUT Distribution: MSC vs. WireMap 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% %LUTs MSCWireMap MSC 4.71%8.00%15.87%23.49%47.93% WireMap 10.12%12.66%17.89%20.19%39.14% LT2LT3LT4LT5LT6 Reduced Increased

15 15 Summary  Presented cut-based structural mapping with minimization of the number of edges  Extended area recover to perform edge recovery Area flow  Edge flow Area flow  Edge flow Exact local area  Exact local edges Exact local area  Exact local edges  Experimental results Reduced the total number of pin-to-pin connections Reduced the total number of pin-to-pin connections Improved QoR after place-and-route Improved QoR after place-and-route Improves packing by increasing ratio of smaller LUTs Improves packing by increasing ratio of smaller LUTs

16 16 Backup Material

17 17 Backup Material Technology Mapping  Delay-optimal mapping Delay-optimal mapping for all nodes Delay-optimal mapping for all nodes  Area recovery Global area recovery Global area recovery Local (exact) area recovery Local (exact) area recovery c d e f a b 1 1 1 2 c d e f a b 1 3 112 f f q r p t s s Cut {pqr} of node f has arrival time 3 Cut {stu} of node f has arrival time 2 u Cut size K = 3

18 18 Appendix - How to Measure Area? c d e f a b qr x p y c d e f a b qr x p y Area of cut {pcd} = 1 + [1 + 0 + 0] = 2 Area of cut {abq} = 1 + [ 0 + 0 + 1] = 2 Suppose we use the naïve definition: Area (cut) = 1 + [ Σ area (fanin) ] Naïve definition says both cuts are equally good in area Naïve definition ignores sharing due to multiple fanouts cut {pcd} cut {abq} 1 ?

19 19 Appendix - Area-flow c d e f a b qr x p y c d e f a b qr x p y Area-flow of cut {pcd} = 1 + [1 + 0 + 0] = 2 Area-flow of cut {abq} = 1 + [ 0/1 + 0/1 + ½] = 1.5 area-flow (cut) = 1 + [ Σ ( area-flow ( fanin ) / fanout_num( fanin ) ) ] Area-flow “correctly” accounts for sharing Area-flow recognizes that cut {abq} is better ½ cut {pcd}

20 20 Appendix - Exact Local Area dbcefa s t p q f dbcefa s t p q f Cut {stq} Area flow = 1+ [.25+.25 +1] = 2.5 Exact area = 1 + 1 = 2 (due to q) Area flow will choose this cut. Cut {pef} Area flow = 1+ [(.25+.25+3)/2] = 2.75 Exact area = 1 + 0 (p is used elsewhere) Exact area will choose this cut. 66 6 6 Exact-local-area (cut) = 1 + [ Σ exact-local-area (fanin with no other fanout) ] 11 1/8 1 1

21 21 Example dbcefa s t p q f dbcefa s t p q f Cut {stq} Area flow = 1+ [.25+.25 +1] = 2.5 Edge flow = 3+ [2 + 4(0.25)] = 6 Exact area = 1 + 1 = 2 (due to q) Exact edge = 3 + 2 = 5 (q is MFFC) Cut {pef} Area flow = 1+ [(.25+.25+3)/2] = 2.75 Edge flow = 3+ [1 + 1 + (2.5+2.5+2)/2] = 8.5 Exact area = 1 + 0 (p is used elsewhere) Exact edge = 3 + 0 (p is NOT MFFC) 66 6 6 Exact-local-area (cut) = 1 + [ Σ exact-local-area (fanin with no other fanout) ] 2 2 2/4 2 1/8 12.5

22 22 Appendix - Tuning Mapping for Placement  Placement-aware priority cost function The total number of edges in a mapped network The total number of edges in a mapped network  Advantages Correlates with the total wire-length after placement Correlates with the total wire-length after placement Easy to take into account during area recovery Easy to take into account during area recovery  Treat “edges” as “area”, resulting in Edge flow (similar to area flow) Edge flow (similar to area flow) Exact local edges (similar to exact local area) Exact local edges (similar to exact local area)  WireMap New placement-aware mapping algorithm New placement-aware mapping algorithm

23 23 Edge recovery overview  Key: Find a simple to compute cut metric that minimizes edge counts and creates more small LUTs 1. Edge flow phase: Use edge flow cost function to minimize global edge counts 2. Exact edge phase: Use optimal algorithm to minimize edge counts within MFFCs

24 24 Appendix – Additional VPR Results  VPR Result for 4-LUT cluster (resemble commercial FPGA SLICE structure) BaselineMSCWireMap CW1.0000.9740.948 TWL1.0000.9060.859


Download ppt "1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,"

Similar presentations


Ads by Google