Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.

Slides:



Advertisements
Similar presentations
Architecture-Specific Packing for Virtex-5 FPGAs
Advertisements

ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
Interconnect Complexity-Aware FPGA Placement Using Rent’s Rule G. Parthasarathy Malgorzata Marek-Sadowska Arindam Mukherjee Amit Singh University of California,
BSPlace: A BLE Swapping technique for placement Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 1.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Clustering of Large Designs for Channel-Width Constrained FPGAs Marvin TomGuy Lemieux University of British Columbia Department of Electrical and Computer.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
Evolution of implementation technologies
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.
Lecture 3 1 ECE 412: Microcomputer Laboratory Lecture 3: Introduction to FPGAs.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Dr. Konstantinos Tatas ACOE201 – Computer Architecture I – Laboratory Exercises Background and Introduction.
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Placement by Simulated Annealing. Simulated Annealing  Simulates annealing process for placement  Initial placement −Random positions  Perturb by block.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Power Reduction for FPGA using Multiple Vdd/Vth
Titan: Large and Complex Benchmarks in Academic CAD
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
CPLD (Complex Programmable Logic Device)
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Programmable Logic Devices
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
FPGA Global Routing Architecture Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Incremental Placement Algorithm for Field Programmable Gate Arrays David Leong Advisor: Guy Lemieux University of British Columbia Department of Electrical.
Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead Heng Tan and Ronald F. DeMara University of Central Florida.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
Parallel Routing for FPGAs based on the operator formulation
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
FPGA CAD 10-MAR-2003.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.
ESE Spring DeHon 1 ESE534: Computer Organization Day 18: March 26, 2012 Interconnect 5: Meshes (and MoT)
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Slide 1 SLIP 2004 Payman Zarkesh-Ha, Ken Doniger, William Loh, and Peter Bendix LSI Logic Corporation Interconnect Modeling Group February 14, 2004 Prediction.
ESE534: Computer Organization
Placement study at ESA Filomena Decuzzi David Merodio Codinachs
ESE534: Computer Organization
Incremental Placement Algorithm for Field Programmable Gate Arrays
Verilog to Routing CAD Tool Optimization
ESE534: Computer Organization
ESE534: Computer Organization
CS184a: Computer Architecture (Structures and Organization)
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
CS184a: Computer Architecture (Structure and Organization)
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada

2 Contributions Two new FPGA benchmark circuit “suites” –Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs –Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand Two new FPGA CAD flows –DHPack: Design Hierarchy Packing Identify congested IP blocks  depopulate  reduced interconnect demand Conference paper: “Logic Block Clustering…”, published at DAC 2005 –Un/DoPack: UnPack and DoPack Find “local” interconnect congestion  depopulate  reduced interconnect demand Conference paper, submitted to DAC 2006 Discoveries… –“Non-uniform” depopulation limits area inflation –“BLE limiting” gives better interconnect controllability than “Input limiting” –“Interconnect variation” important for area inflation and FPGA architecture design –“Routing closure” achieved by re-clustering and incremental place & route UNROUTABLE circuits made ROUTABLE  buy an FPGA with MORE LOGIC!!!

3 Mesh-Based FPGA Architecture 9 logic blocks 4 wires per channel 3*4=12 total horizontal tracks LLLLLLLLLLLLLLLLLLLLL L L L L Larger FPGAs have more “aggregate” interconnect 16 logic blocks 4 wires per channel 4*4=16 total horizontal tracks

4 Logic Utilization vs. Channel Width Trade-off logic utilization for channel width –User can always buy more logic…. (not more wires) FPGA 1FPGA 2 LLLL LLLL LLLL LLLL LLLL LLLL LLLL LLLL L L L L LLLLL Trade-off: CLB count for Channel width But….. can we achieve lower Total Area? ( = SIZE * CLB Count) ( No! but we can break even! )

5 Logic Element: BLE and CLB Basic Logic Element (BLE) –‘k’-input LUT + FF Configurable Logic Block (CLB) –‘N’ BLEs, ‘N’ outputs –‘ I ’ shared inputs ‘ I ’ Inputs ‘N’ Outputs BLE #1 BLE #2 BLE #3 BLE #4 BLE #5 CLB LLLL LLLL LLLL LLLL Note: I < k*N

6 CLB Depopulation General Approach –Use existing clustering tools –Do not fill CLB while clustering 1.Input-Limited Eg. Maximum 67% input utilization per CLB Might use all BLEs 2.BLE-Limited Eg. Maximum 60% BLE utilization per CLB Might use all Inputs BLE #1 BLE #2 BLE #3 BLE #4 BLE #5 CLB ‘ I ’ Inputs ‘N’ Outputs

7 Reducing Channel Width Results (max cluster size 16, max num inputs 51) Input-Limited No channel width control BLE-Limited (almost) monotonically increasing  good channel width control

8 Meta Benchmark Circuit Creation Mimic process of creating large designs –“IP Blocks” MCNC Circuits –SoC Randomly integrate/stitch together “IP Blocks” –IP Blocks have varied interconnect needs Considered 3 stitching schemes… –Independent IP Blocks are not connected to each other –Pipeline Outputs of one IP block connected to inputs of next IP block –Clique Outputs of each IP block are uniformly distributed to inputs of all other IP blocks

9 DHPack: Meta Circuit P&R Use VPR FPGA tools from University of Toronto Observation 1 –VPR placer successfully groups IP blocks from random initial placement Observation 2 –VPR router confirms channel width of MetaCircuit is dominated by a few IP blocks { pdc, clma, ex1010 }

10 1 Channel Width Constraint Normalized Area DHPack: Meta Circuit P&R Results Clique MetaCircuit –P&R channel width results closely match “constraints” Shrink Channel Width by ~20% (from 95 to 75), NO AREA INCREASE by ~50% (from 95 to 50), 1.7x area increase Channel Width Constraint Channel Width ConstraintRouted

11 Meta Circuits vs. Stdev Circuits Meta Circuit Drawbacks –Design hierarchy boundaries not well-defined –Coarse-grained IP block boundary –Stitching unrealistic Flip Flop placed at every output Connections only have FO1 Stdev Circuits (created using GNL) –Synthetic clone of Meta circuits –Hierarchical  specify Rent parameter of each partition Root  # I/Os, # IP blocks Second Level  20 IP blocks, # LEs, Rent parameter

12 Stdev Circuits: Rent Parameters 7 benchmark circuits 240/120 primary inputs/outputs, approx 52,000 CLBs Rent parameter: Average 0.62, vary Stdev 0.0 to 0.12

13 Un/DoPack Flow Iterative non-uniform cluster depopulation tool Step 1: Traditional SIS/VPR Step 2: UnPack: –Congestion Calculator Step 3: DoPack: –Incremental Re-Cluster Step 4,5: Fast Place/Route

14 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR

15 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR

16 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR

17 Un/DoPack Flow: UnPack Step 2: UnPack –Generate Congestion Map –CLB Label = Largest CW occ in 4 adjacent channels

18 Un/DoPack Flow: UnPack Step 2: UnPack: Depop Center = Largest CLB label M X M Array

19 Un/DoPack Flow: UnPack Step 2: UnPack: Depop Radius = M/4 Depop Amt: 1 new row/col in array M X M Array

20 Un/DoPack Flow: DoPack Step 3: DoPack: –Incremental Re-Cluster

21 Un/DoPack Flow: Fast P&R Step 4,5: Fast Place/Route Fast Placement –UBC Incremental Placer (under development) –VPR “–fast” option Router –Use full routed solution Slow but reliable

22 Before120/79/27 After100/79/20 Peak / Avg / Stddev

23 Normalized Area of GNL Benchmarks

24 Absolute Area of GNL Benchmarks

25 Interconnect Variation: Impact on FPGA Architecture Design High Variation Circuits Require Wide Channel Width

26 Contributions Two new FPGA benchmark circuit “suites” –Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs –Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand Two new FPGA CAD flows –DHPack: Design Hierarchy Packing Identify congested IP blocks  depopulate  reduced interconnect demand Conference paper: “Logic Block Clustering…”, published at DAC 2005 –Un/DoPack: UnPack and DoPack Find “local” interconnect congestion  depopulate  reduced interconnect demand Conference paper, submitted to DAC 2006 Discoveries… –“Non-uniform” depopulation limits area inflation –“BLE limiting” gives better interconnect controllability than “Input limiting” –“Interconnect variation” important for area inflation and FPGA architecture design –“Routing closure” achieved by re-clustering and incremental place & route UNROUTABLE circuits made ROUTABLE  buy an FPGA with MORE LOGIC!!!

End of Talk