Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada

2 Contributions Two new FPGA benchmark circuit “suites” –Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs –Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand Two new FPGA CAD flows –DHPack: Design Hierarchy Packing Identify congested IP blocks  depopulate  reduced interconnect demand Conference paper: “Logic Block Clustering…”, published at DAC 2005 –Un/DoPack: UnPack and DoPack Find “local” interconnect congestion  depopulate  reduced interconnect demand Conference paper, submitted to DAC 2006 Discoveries… –“Non-uniform” depopulation limits area inflation –“BLE limiting” gives better interconnect controllability than “Input limiting” –“Interconnect variation” important for area inflation and FPGA architecture design –“Routing closure” achieved by re-clustering and incremental place & route UNROUTABLE circuits made ROUTABLE  buy an FPGA with MORE LOGIC!!!

3 Mesh-Based FPGA Architecture 9 logic blocks 4 wires per channel 3*4=12 total horizontal tracks LLLLLLLLLLLLLLLLLLLLL L L L L Larger FPGAs have more “aggregate” interconnect 16 logic blocks 4 wires per channel 4*4=16 total horizontal tracks

4 Logic Utilization vs. Channel Width Trade-off logic utilization for channel width –User can always buy more logic…. (not more wires) FPGA 1FPGA 2 LLLL LLLL LLLL LLLL LLLL LLLL LLLL LLLL L L L L LLLLL Trade-off: CLB count for Channel width But….. can we achieve lower Total Area? ( = SIZE * CLB Count) ( No! but we can break even! )

5 Logic Element: BLE and CLB Basic Logic Element (BLE) –‘k’-input LUT + FF Configurable Logic Block (CLB) –‘N’ BLEs, ‘N’ outputs –‘ I ’ shared inputs ‘ I ’ Inputs ‘N’ Outputs BLE #1 BLE #2 BLE #3 BLE #4 BLE #5 CLB LLLL LLLL LLLL LLLL Note: I < k*N

6 CLB Depopulation General Approach –Use existing clustering tools –Do not fill CLB while clustering 1.Input-Limited Eg. Maximum 67% input utilization per CLB Might use all BLEs 2.BLE-Limited Eg. Maximum 60% BLE utilization per CLB Might use all Inputs BLE #1 BLE #2 BLE #3 BLE #4 BLE #5 CLB ‘ I ’ Inputs ‘N’ Outputs

7 Reducing Channel Width Results (max cluster size 16, max num inputs 51) Input-Limited No channel width control BLE-Limited (almost) monotonically increasing  good channel width control

8 Meta Benchmark Circuit Creation Mimic process of creating large designs –“IP Blocks” MCNC Circuits –SoC Randomly integrate/stitch together “IP Blocks” –IP Blocks have varied interconnect needs Considered 3 stitching schemes… –Independent IP Blocks are not connected to each other –Pipeline Outputs of one IP block connected to inputs of next IP block –Clique Outputs of each IP block are uniformly distributed to inputs of all other IP blocks

9 DHPack: Meta Circuit P&R Use VPR FPGA tools from University of Toronto Observation 1 –VPR placer successfully groups IP blocks from random initial placement Observation 2 –VPR router confirms channel width of MetaCircuit is dominated by a few IP blocks { pdc, clma, ex1010 }

10 1 Channel Width Constraint Normalized Area DHPack: Meta Circuit P&R Results Clique MetaCircuit –P&R channel width results closely match “constraints” Shrink Channel Width by ~20% (from 95 to 75), NO AREA INCREASE by ~50% (from 95 to 50), 1.7x area increase Channel Width Constraint Channel Width ConstraintRouted

11 Meta Circuits vs. Stdev Circuits Meta Circuit Drawbacks –Design hierarchy boundaries not well-defined –Coarse-grained IP block boundary –Stitching unrealistic Flip Flop placed at every output Connections only have FO1 Stdev Circuits (created using GNL) –Synthetic clone of Meta circuits –Hierarchical  specify Rent parameter of each partition Root  # I/Os, # IP blocks Second Level  20 IP blocks, # LEs, Rent parameter

12 Stdev Circuits: Rent Parameters 7 benchmark circuits 240/120 primary inputs/outputs, approx 52,000 CLBs Rent parameter: Average 0.62, vary Stdev 0.0 to 0.12

13 Un/DoPack Flow Iterative non-uniform cluster depopulation tool Step 1: Traditional SIS/VPR Step 2: UnPack: –Congestion Calculator Step 3: DoPack: –Incremental Re-Cluster Step 4,5: Fast Place/Route

14 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR

17 Un/DoPack Flow: UnPack Step 2: UnPack –Generate Congestion Map –CLB Label = Largest CW occ in 4 adjacent channels

18 Un/DoPack Flow: UnPack Step 2: UnPack: Depop Center = Largest CLB label M X M Array

19 Un/DoPack Flow: UnPack Step 2: UnPack: Depop Radius = M/4 Depop Amt: 1 new row/col in array M X M Array

20 Un/DoPack Flow: DoPack Step 3: DoPack: –Incremental Re-Cluster

21 Un/DoPack Flow: Fast P&R Step 4,5: Fast Place/Route Fast Placement –UBC Incremental Placer (under development) –VPR “–fast” option Router –Use full routed solution Slow but reliable

22 Before120/79/27 After100/79/20 Peak / Avg / Stddev

23 Normalized Area of GNL Benchmarks

24 Absolute Area of GNL Benchmarks

25 Interconnect Variation: Impact on FPGA Architecture Design High Variation Circuits Require Wide Channel Width

26 Contributions Two new FPGA benchmark circuit “suites” –Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs –Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand Two new FPGA CAD flows –DHPack: Design Hierarchy Packing Identify congested IP blocks  depopulate  reduced interconnect demand Conference paper: “Logic Block Clustering…”, published at DAC 2005 –Un/DoPack: UnPack and DoPack Find “local” interconnect congestion  depopulate  reduced interconnect demand Conference paper, submitted to DAC 2006 Discoveries… –“Non-uniform” depopulation limits area inflation –“BLE limiting” gives better interconnect controllability than “Input limiting” –“Interconnect variation” important for area inflation and FPGA architecture design –“Routing closure” achieved by re-clustering and incremental place & route UNROUTABLE circuits made ROUTABLE  buy an FPGA with MORE LOGIC!!!

End of Talk

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.

Similar presentations

Presentation on theme: "Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.

Similar presentations

Presentation on theme: "Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of."— Presentation transcript:

Similar presentations

About project

Feedback