Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.

Slides:



Advertisements
Similar presentations
Cadence Design Systems, Inc. Why Interconnect Prediction Doesn’t Work.
Advertisements

Interconnect Complexity-Aware FPGA Placement Using Rent’s Rule G. Parthasarathy Malgorzata Marek-Sadowska Arindam Mukherjee Amit Singh University of California,
BSPlace: A BLE Swapping technique for placement Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 1.
Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.
Coupling-Aware Length-Ratio- Matching Routing for Capacitor Arrays in Analog Integrated Circuits Kuan-Hsien Ho, Hung-Chih Ou, Yao-Wen Chang and Hui-Fang.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Clustering of Large Designs for Channel-Width Constrained FPGAs Marvin TomGuy Lemieux University of British Columbia Department of Electrical and Computer.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
ER UCLA UCLA ICCAD: November 5, 2000 Predictable Routing Ryan Kastner, Elaheh Borzorgzadeh, and Majid Sarrafzadeh ER Group Dept. of Computer Science UCLA.
International Symposium of Physical Design Sonoma County, CA April 2001ER UCLA UCLA 1 Congestion Estimation During Top-Down Placement Xiaojian Yang Ryan.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
On Modeling and Sensitivity of Via Count in SOC Physical Implementation Kwangok Jeong Andrew B. Kahng.
Reconfigurable Computing (EN2911X, Fall07)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
University of Toronto Pre-Layout Estimation of Individual Wire Lengths Srinivas Bodapati (Univ. of Illinois) Farid N. Najm (Univ. of Toronto)
© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,
Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.
Accurate Pseudo-Constructive Wirelength and Congestion Estimation Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Xu Xu, UCSD CSE Dept., La Jolla Supported.
CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
Placement by Simulated Annealing. Simulated Annealing  Simulates annealing process for placement  Initial placement −Random positions  Perturb by block.
CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.
Power Reduction for FPGA using Multiple Vdd/Vth
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CAD for Physical Design of VLSI Circuits
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation High-Performance.
Seeing the Forest and the Trees: Steiner Wirelength Optimization in Placement Jarrod A. Roy, James F. Lu and Igor L. Markov University of Michigan Ann.
New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,
1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.
University of British Columbia Dept. of Electrical and Computer Engineering November 30, 2007 A Combined Clustering and Placement Algorithm for FPGAs Mark.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
Jason Cong‡†, Guojie Luo*†, Kalliopi Tsota‡, and Bingjun Xiao‡ ‡Computer Science Department, University of California, Los Angeles, USA *School of Electrical.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 6: Detailed Routing © KLMH Lienig 1 What Makes a Design Difficult to Route Charles.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.
FPGA Global Routing Architecture Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Incremental Placement Algorithm for Field Programmable Gate Arrays David Leong Advisor: Guy Lemieux University of British Columbia Department of Electrical.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
"Fast estimation of the partitioning Rent characteristic" Fast estimation of the partitioning Rent characteristic using a recursive partitioning model.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
FPGA CAD 10-MAR-2003.
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Interconnect Driver Design for Long Wires in FPGAs Edmund Lee University of British Columbia Electrical & Computer Engineering MASc Thesis Presentation.
FPGA Routing Pathfinder [Ebeling, et al., 1995] Introduced negotiated congestion During each routing iteration, route nets using shortest.
SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
CSE 144 Project. Overall Goal of the Project Implement a physical design tool for a two- row standard cell design
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
Interconnect Driver Design for Long Wires in FPGAs Edmund Lee, Guy Lemieux & Shahriar Mirabbasi University of British Columbia, Canada Electrical & Computer.
Dirk Stroobandt Ghent University Electronics and Information Systems Department Multi-terminal Nets do Change Conventional Wire Length Distribution Models.
Placement and Routing Algorithms. 2 FPGA Placement & Routing.
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
Slide 1 SLIP 2004 Payman Zarkesh-Ha, Ken Doniger, William Loh, and Peter Bendix LSI Logic Corporation Interconnect Modeling Group February 14, 2004 Prediction.
Revisiting and Bounding the Benefit From 3D Integration
Incremental Placement Algorithm for Field Programmable Gate Arrays
Defect Tolerance for Nanocomputer Architecture
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British Columbia Department of Electrical and Compute Engineering SLIP 2007

Outline Motivations for congestion localization heuristics. Exploring heuristics  Post-placement  Pre-placement Results Future Work SLIP 2007

3 FPGAs: FIXED Routing Architecture SLIP 2007 Fixed Channel Width. Over 80% of resources devoted to interconnect. Comprised of repeated tiles.  Routing resources identical throughout. Can potentially have enough logic resources, not enough routing resources for a design.

4 FPGA Routing Architecture Design SLIP 2007 Architecture design involves retargetable CAD flow.  Cover large amount of customer benchmarks. Routing resources accommodate majority of customer designs that fit in FPGA's logic resources.  Requires excessive amount of fixed interconnect. FPGA Architecture design involves retargetable CAD flow.  Explore different amounts of routing resources.  Select routing that performs best across all circuits. Less fixed routing = higher density, performance. Less fixed routing = more unroutable designs. More fixed routing = more wastage.  Can use 100% of logic resource.  Can never use 100% of routing resources.  Results in excess programmable interconnect. Congestion aware CAD improves routability. Allows architects to get away with less excess programmable interconnect.

5 FPGA vs ASIC Congestion Impact SLIP 2007 Two CAD flows. All results are equal EXCEPT... Only one produces evenly distributed interconnect. ASIC world => No major advantage. FPGA world => Smaller channel width. Allows for denser FPGA architecture. Reduces interconnect wastage. Locating congestion can help with this balancing.

6 Balanced Routing SLIP 2007 waste

7 Balanced Routing: Denser FPGA SLIP 2007 Channel Width = 7 Channel Width = 3

8 Further Motivations for Congestion Localization SLIP 2007 High quality congestion estimation can be slow. May not be realistic to constantly update with every move. Localization can give different weights to different nets, CLBs, LUTs. Update weights during intervals. Example application: SA optimization, Un/Dopack.

9 Motivations for accurate congestion estimation: Un/DoPack SLIP 2007 depopulated clustering incremental place incremental route clusterplaceroute channel width constraint met? success failure start with netlist congestion calculator available area left? yes no channel width constraint met? yes no

10 Motivations: Un/DoPack SLIP 2007 depopulated clustering incremental place route clusterplace route channel width constraint met? success failure start with netlist congestion calculator available area left? yes no channel width constraint met? yes no congestion calculator no

11 Motivations: Un/DoPack SLIP 2007 depopulated clustering place route cluster place route channel width constraint met? success failure start with netlist congestion calculator available area left? yes no channel width constraint met? yes no congestion calculator yes no

12 Motivations for accurate congestion localization: Un/DoPack Identify regions to add white space SLIP 2007

13 Congestion Localization Measurement SLIP 2007 Requirements: Applicable before and after placement, can integrate into Un/Dopack, can be easily displayed visually. Solution: Assign a congestion value to each CLB. Allows for localization before and after placement. Assigning to specific routing resources not practical before placement. Quality Measurement: Perform full place and route. Real congestion = Max tracks on each side of CLB. Compare to estimate.

14 Quality Measurement: Fidelity VS Accuracy SLIP 2007 Previous work reports accuracy of estimate to actual peak channel widths. Does not report localization quality, or fidelity. Congestion estimation requires both accuracy and fidelity. Accuracy well studied. Therefore fidelity is the focus of this work. Fidelity can always be scaled to an accuracy heuristic. Good localization required to balance congestion. Fidelity = FPGA centric measurement. Higher Accuracy Higher Fidelity Poor LocalizationGood Localization Actual congestion from router

15 Measuring Fidelity SLIP 2007 Linearly scale actual and real congestion maps so that min and max congestion of both maps are equal. Subtract difference between each CLB's congestion estimate and actual CLB's congestion value after place and route. % Error = Avg of absolute value of the differences / peak CLB congestion. Average absolute normalized error. M rows, M columns. E = Estimate, R = Real :

16 Plot average cuts per partition size line of best fit: log T = p●log(G) + log(t) T =aG P p = Rent exponent. We will use this as our congestion value. Exploring heuristics: Local Rent Exponent SLIP 2007 log (# of CLBs) log (# of cuts)Window Size = 5

17 Demo

24 Exploring heuristics: Local Rent Exponent Benefit:  Characterizes wire length distribution. Downsides:  Requires a lot of data points.  Better for characterizing entire circuits.  Smaller window subject to anomylies.  Larger window loses locality of estimation.  Rate of change of cuts, not absolute value. SLIP 2007

25 Exploring heuristics: Net cuts per region Rent exponent captures rate of change of cuts => wire length distribution. Absolute number of cuts may be better for locality. Example: region size of 3x3. Count number of nets crossing this boundary. SLIP 2007

26

30 Post Processing Heuristic #1: Cartesian Blending SLIP 2007 F0G0E0 A0B0C0 K0J0I0 F1 = (1-α)*F0 + α*(E0 + B0 + G0 + J0)/4 G1 = (1-α)*G0 + α*(F0 + C0 + H0 + K0)/4 H0 D0 L0 Blend Step 0

31 Post Processing Heuristic #1: Cartesian Blending SLIP 2007 F2 = (1-α)*F1 + α*(E1 + B1 + G1 + J1)/4 G2 = (1-α)*G1 + α*(F1 + C1 + H1 + K1)/4 Blend Step 1 F1G1E1 A1B1C1 K1J1I1 H1 D1 L1

32

39 Exploring heuristics: Bounding Box Overlap Assign CLB value equal to number of bounding boxes it resides in. Zhuo et al. use this during every SA swap in VPR's placer, yielding avg of 7.1% channel width reduction. SLIP 2007

40

41 Exploring heuristics: Wire Length Per Area Expected wire-length of net = ½ perimeter bounding box SLIP 2007

42 Probability of net routed at any given point in bounding box = expected length / bounding box area. Exploring heuristics: Bounding Box SLIP 2007

43 ½ perimeter bounding box not realistic for high fan-out nets. extra pin factor = min(BBWidth, BBHeight)*max(0,num_pins – 3) expected wire length = 1/2BB + (extra pin factor)*α probability of wire = expected wire length / area Exploring heuristics: Wire Length Per Area SLIP 2007

50 Blending helps spread probability distribution. Probability outside bounding box > 0. Exploring heuristics: Bounding Box SLIP 2007 p(wire) = 0 p(wire) > 0

51 Post Processing Heuristic #2: Saturated Congestion Ideal routing solution would have no channel width constraint. Congestion maps of an architecture without a channel width constraint would have sharper peaks. Channel width constraint places a ceiling on wire density.  Forces routing in vicinity of ideal path. This ceiling and spreading of wire density can be emulated by saturating the congestion. SLIP 2007

52

58 Exploring Heuristics: Single Pass Route Pathfinder routes, calculates overuse, then reroutes. First routing attempt as congestion estimate.  Each CLB assigned congestion value based on max # of tracks used on each side of CLB. congestion = 4 SLIP 2007

59

65 Congestion Estimation Before Placement? All previous heuristics require spatial information. No spatial information available before placement. How can we accurately predict congestion localization without a placement? SLIP 2007

66

67 Cartesian blend: (needs placement info) Logical/Net blend: (Does not require placement info) Exploring Heuristics: Blending Pin Count SLIP 2007

68

75 Error Produced By Each Heuristic for MCNC 20 SLIP 2007 a.a.n.e.%

76 Error Before and After Saturation and Blending SLIP 2007 a.a.n.e. %

77 Speed VS Fidelity SLIP 2007 a.a.n.e.% Time (s)

78 Can quickly and accurately locate regions of high congestion.  After placement Local Rent exponent Net cuts per region Bounding box overlap => improved => wire length per area Single pass route  Before placement Blending pin count => localize congestion before placement Post processing improve all heuristics. Compare fidelity instead of accuracy.  Necessary for balancing FPGA interconnect. Visual, tunable tool helpful for discovering / improving heuristics.  Journey as important as destination. Conclusion SLIP 2007

79 Integrating into Un/DoPack. Congestion aware placement. Congestion aware clustering. Congestion estimation before clustering. Future Work SLIP 2007