Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.

Slides:



Advertisements
Similar presentations
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
Advertisements

ECE 506 Reconfigurable Computing ece. arizona
ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
Interconnect Complexity-Aware FPGA Placement Using Rent’s Rule G. Parthasarathy Malgorzata Marek-Sadowska Arindam Mukherjee Amit Singh University of California,
BSPlace: A BLE Swapping technique for placement Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 1.
Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
Reducing the Pressure on Routing Resources of FPGAs with Generic Logic Chains Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
EECE579: Digital Design Flows
Clustering of Large Designs for Channel-Width Constrained FPGAs Marvin TomGuy Lemieux University of British Columbia Department of Electrical and Computer.
A System-Level Stochastic Benchmark Circuit Generator for FPGA Architecture Research Cindy Mark Prof. Steve Wilton University of British Columbia Supported.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Evolution of implementation technologies
Stochastic Physical Synthesis for FPGAs with Pre-routing Interconnect Uncertainty and Process Variation Yan Lin and Lei He EE Department, UCLA
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
Lecture 3 1 ECE 412: Microcomputer Laboratory Lecture 3: Introduction to FPGAs.
HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
Automating Shift-Register-LUT Based Run-Time Reconfiguration Karel Heyse, Brahim Al Farisi, Karel Bruneel, Dirk Stroobandt
Placement by Simulated Annealing. Simulated Annealing  Simulates annealing process for placement  Initial placement −Random positions  Perturb by block.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Power Reduction for FPGA using Multiple Vdd/Vth
Titan: Large and Complex Benchmarks in Academic CAD
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research Andrew B. Kahng, Hyein Lee and Jiajia Li UC San Diego VLSI CAD Laboratory.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
J. Christiansen, CERN - EP/MIC
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
Jason Cong‡†, Guojie Luo*†, Kalliopi Tsota‡, and Bingjun Xiao‡ ‡Computer Science Department, University of California, Los Angeles, USA *School of Electrical.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.
FPGA Global Routing Architecture Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Incremental Placement Algorithm for Field Programmable Gate Arrays David Leong Advisor: Guy Lemieux University of British Columbia Department of Electrical.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Parallel Routing for FPGAs based on the operator formulation
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
Directional and Single-Driver Wires in FPGA Interconnect Guy Lemieux Edmund LeeMarvin TomAnthony Yu Dept. of ECE, University of British Columbia Vancouver,
An Improved “Soft” eFPGA Design and Implementation Strategy
FPGA CAD 10-MAR-2003.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.
Defect-tolerant FPGA Switch Block and Connection Block with Fine-grain Redundancy for Yield Enhancement Anthony J. YuGuy G.F. Lemieux August 25, 2005.
ESE Spring DeHon 1 ESE534: Computer Organization Day 18: March 26, 2012 Interconnect 5: Meshes (and MoT)
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
Interconnect Driver Design for Long Wires in FPGAs Edmund Lee, Guy Lemieux & Shahriar Mirabbasi University of British Columbia, Canada Electrical & Computer.
Xiao Patrick Dong Supervisor: Guy Lemieux. Goal: Reduce critical path  shorter period Decrease dynamic power 2.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
ESE534: Computer Organization
Runtime-Quality Tradeoff in Partitioning Based Multithreaded Packing
Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs Ilya Ganusov, Benjamin Devlin.
Floating-Point FPGA (FPFPGA)
Andy Ye, Jonathan Rose, David Lewis
Incremental Placement Algorithm for Field Programmable Gate Arrays
An Active Glitch Elimination Technique for FPGAs
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
CS184a: Computer Architecture (Structure and Organization)
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc. San Jose, CA, USA *Work performed at University of British Columbia David Leong University of British Columbia Vancouver, BC, Canada Guy Lemieux University of British Columbia Vancouver, BC, Canada

2 Overview Introduction, Goals and Motivation –Reduce channel width, lower cost, make circuits “routable” Benchmark Circuits –Varying amount of interconnect variation Un/DoPack CAD Tool: –Iterative channel width reduction by whitespace insertion Results Conclusion

3 Overview Introduction, Goals and Motivation –Reduce channel width, lower cost, make circuits “routable” Benchmark Circuits –Varying amount of interconnect variation Un/DoPack CAD Tool: –Iterative channel width reduction by whitespace insertion Results Conclusion

4 Mesh-Based FPGA Architecture 9 logic blocks 4 wires per channel 3*4=12 total horizontal tracks LLLLLLLLLLLLLLLLLLLLL L L L L Larger FPGAs have more “aggregate” interconnect 16 logic blocks 4 wires per channel 4*4=16 total horizontal tracks

5 Motivation: Area of FPGA Devices Number of Layout Tiles SIZE of Layout Tile Total Layout AREA = SIZE * Number MCNC Circuits Mapped onto an FPGA

6 Motivation: Channel Width Demand Logic Range User buys bigger device. Interconnect Range User has no choice! Devices built for worst-case channel width (fixed width) Interconnect dominates area (>70%) MCNC Circuits Mapped onto an FPGA

7 Goal: Reduce Channel Width But { apex4, elliptic, frisc, ex1010, spla, pdc } are unroutable…. Can we make them routable in a Constrained FPGA? Altera Cyclone Channel width constraint of 80 routing tracks Constrained FPGA Channel width constraint of 60 routing tracks Smaller area, lower cost for low-channel-width circuits

8 Possible Solution Trade-off logic utilization for channel width –User can always buy more logic…. (not more wires) FPGA 1FPGA 2 LLLL LLLL LLLL LLLL LLLL LLLL LLLL LLLL L L L L LLLLL Trade-off: CLB count for Channel width What about area??

9 Features and Costs of Two FPGA Families Sample Benchmark Circuit –10,000 LEs –150 Routing Tracks –No Multipliers –100 K Memory Altera DeviceLEsMemoryMult.RoutingCost Cyclone 1C1212,060239,616080$56 Stratix 1S1010,570920, $190 Cyclone 1C2020,060294,912080$100 Stratix 1S2018,4601,669, $350 Sample Benchmark Circuit –20,000 LEs –75 Routing Tracks

10 Overview Introduction, Goals and Motivation –Reduce channel width, lower cost, make circuits “routable” Benchmark Circuits –Varying amount of interconnect variation Un/DoPack CAD Tool: –Iterative channel width reduction by whitespace insertion Results Conclusion

11 GNL Circuit Benchmark Suite Create benchmark circuits with variation –SoC Randomly integrate/stitch together “IP Blocks” –IP Blocks have varied interconnect needs Generate Netlist (GNL) Ghent University –Synthetic benchmark generator GNL circuits generated hierarchically –Root  # I/Os, # IP blocks –Second Level  20 IP blocks, # LEs, Rent parameter

12 Rent Linear Interpolation 7 benchmark circuits Average Rent = 0.62, Stdev Rent = 0  /120 primary inputs/outputs

13 Overview Introduction, Goals and Motivation –Reduce channel width, lower cost, make circuits “routable” Benchmark Circuits –Varying amount of interconnect variation Un/DoPack CAD Tool: –Iterative channel width reduction by whitespace insertion Results Conclusion

14 Un/DoPack Flow Iterative non-uniform cluster depopulation tool Step 1: Traditional SIS/VPR Step 2: UnPack: –Congestion Calculator Step 3: DoPack: –Incremental Re-Cluster Step 4,5: Fast Place/Route

15 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR

16 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR

17 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR

18 Un/DoPack Flow: UnPack Step 2: UnPack: –Congestion Calculator

19 Un/DoPack Flow: UnPack Step 2: UnPack –Generate Congestion Map –CLB Label = Largest CW occ in 4 adjacent channels

20 Un/DoPack Flow: UnPack Step 2: UnPack: –Depop Center = Largest CLB label M X M Array

21 Un/DoPack Flow: UnPack Step 2: UnPack: –Option 1 Coarse Grain: Dpop Radius = M/4 Dpop Amt: 1 new row/col in array M X M Array

22 Un/DoPack Flow: UnPack Step 2: UnPack: –Option 2 Fine Grain: Dpop Radius = M/4, M/5, M/6, M/8 Dpop Amt: 1 new row/col in region M X M Array

23 Un/DoPack Flow: DoPack Step 3: DoPack: –Incremental Re-Cluster

24 Un/DoPack Flow: Fast P&R Step 4,5: Fast Place/Route

25 Un/DoPack Flow: Fast P&R Step 4,5: Fast Place/Route Fast Placement –UBC Incremental Placer (under development) –VPR –fast Fast Router –Use illegal pathfinder solution from first iterations Unsuccessful so far –Use full routed solution Slow but reliable

26 Overview Introduction, Goals and Motivation –Reduce channel width, lower cost, make circuits “routable” Benchmark Circuits –Varying amount of interconnect variation Un/DoPack CAD Tool: –Iterative channel width reduction by whitespace insertion Results Conclusion

27 Un/DoPack: Baseline Flow UnPack: Coarse grained congestion calculator DoPack: iRAC replica Fast Place: UBC Incremental Placer Fast Route: None FPGA Architecture: –LUT size (k) = 6 –Cluster size (N) = 16 –Inputs per cluster (I) = 51 –Wires of length (L) = 4

28 Area of GNL Benchmarks

29 Interconnect Variation: Impact on FPGA Architecture Design High Variation Circuits Require Wide Channel Width

30 Critical Path of GNL Benchmarks

31 Un/DoPack Congestion Map Before After Un/DoPack

32 Multi-Region Un-Pack Depopulate multiple regions at once –Depopulate each region separately –Smaller radius = M/10 Handle overlapping regions

33 Normalized Area

34 Normalized Critical Path

35 Run-Time Comparisons

36 Conclusion Un/DoPack: FPGA CAD flow –Find “local” congestion  depopulate  reduced interconnect demand FPGA benchmark circuit “suite” –Stdev: Used to vary interconnect demand Discoveries… –“Non-uniform” depopulation limits area inflation –“Interconnect variation” important for area inflation and FPGA architecture design –“Routing closure” achieved by re-clustering and incremental place & route UNROUTABLE circuits made ROUTABLE  buy an FPGA with MORE LOGIC!!!

End of Talk