CS294-6 Reconfigurable Computing Day 12 October 1, 1998 Interconnect Population.

Slides:



Advertisements
Similar presentations
FPGA Intra-cluster Routing Crossbar Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Advertisements

Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Digital Design – Optimizations and Tradeoffs
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 19: April 9, 2008 Routing 1.
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 18: March 21, 2007 Interconnect 6: MoT.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 16: March 14, 2007 Interconnect 4: Switching.
Lecture 5: FPGA Routing September 17, 2013 ECE 636 Reconfigurable Computing Lecture 5 FPGA Routing.
HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #3 – FPGA.
Interconnect Network Topologies
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Interconnect Networks
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.
1 Dynamic Interconnection Networks Miodrag Bolic.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Shanghai Jiao Tong University 2012 Indirect Networks or Dynamic Networks Guihai Chen …with major presentation contribution from José Flich, UPV (and Cell.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 18, 2005 Interconnect 6: MoT.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2005 Interconnect 4: Switching.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
Based on An Engineering Approach to Computer Networking/ Keshav
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
ESE Spring DeHon 1 ESE534: Computer Organization Day 18: March 26, 2012 Interconnect 5: Meshes (and MoT)
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching.
ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 21: November 28, 2005 Routing 1.
Oleg Petelin and Vaughn Betz FPL 2016
ESE534: Computer Organization
Interconnect Networks
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
Indirect Networks or Dynamic Networks
Interconnection Network Design Lecture 14
ESE534: Computer Organization
ESE534: Computer Organization
CprE / ComS 583 Reconfigurable Computing
CS184a: Computer Architecture (Structure and Organization)
CprE / ComS 583 Reconfigurable Computing
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

CS294-6 Reconfigurable Computing Day 12 October 1, 1998 Interconnect Population

Today Costs of full population Tradeoffs Building Blocks Empirical Evidence

Symmetric

Symmetric Wire Channels IO  N P Bisection BW  N P W  N P-0.5 –N if p<0.5 W  log(N) –P=0.5

XC4K (Commercial Symmetric)

Symmetric Full Population For simplicity: XC4K –8 short –4 double –Two sides with 6 connections –=> 10x10 switchbox + 2x (12x6) C-boxes –2x12x6 + 4x10x30/2 = 744 switches –2.5K 2 /switch => 1.9M 2 –Compare real XC4K at 1.25M 2

Wires vs. Switches Full Population –Switch area => W 2 (4x3/2)W 2 ~15K  W 2 –Wire Area => W 2 8 Wx8 W   W 2

Avoidable? How can we avoid N 2 switches?

Switching Problem Connect any permutation of N sources to N sinks: –Crossbar: N 2 switches

Benes Network Routes any permutation O(Nlog(N)) switches

Tradeoff Crossbar –N 2 –N=16 => 256 –single serial switch Benes –2log(N)-1 stages –N/2 2x2’s per stage –4 switches/2x2 –4Nlog(N)-2N –N=16 => 224 –(w/ 4x4s, 192) –2log b (N)-1 serial switches (2x2=>7, 4x4=>3) General trend: flatter => more total switches factored => longer switch series

Concentrators Select M signals from N –(N>M) Crossbar –NxM

Concentrators Order of outputs often not important –(e.g. LUT inputs) Limit “crossbar” population to: –M x (N-M+1)

Switchboxes Goal: route an input to an output on a different side –doesn’t matter which output …as long as output can route on to destination

Switchboxes Intro: 4 x (3W=>W) –6W 2 (push: does order matter?)

Switchboxes (Linear Population) Connect –each wire –on each side –to a single connection on each destination side Linear total switch population –Switches = (3x4xW)/2 = 6W

Switchboxes (Xilinx/Diamond) Linear -- connect to same corresponding channel each side

Switchboxes (Universal) Linear Principle: Supports all sets of simultaneous connection requests up to channel width limits for switchbox –connection request: N-S, S-W, etc.

Switchbox (domain schemes) Asymptotically: Universal route 25% more connection sets than Diamond (strict superset)

Enough? Universal switch guarantees each switch is locally routable –if all four sides are unconstrained But, routing one universal switch places constraints on connections –not guarantee a whole set of connections can be routed

Mapping Ratio Partial schemes –=> switching limitations may prevent use of some channels As a result, need more channels to detail route than implied by global route –(counting wires in each channel segment) Mapping Ratio –Detail Routing Channels / Global Channels

Lack of CMR for domain schemes Two negative results from UCSB –Any domain scheme (diamond, universal) is: NP-complete in detailed routing –figuring out which channels/switches to use –reduce graph coloring to domain routing No Constant Mapping Ratio (CMR)

Toronto Experiments Review Fig 5 and 6 and Table 2 from –Rose and Brown JSSC v26n3p277, mar91 Recommendations: –3-4 connections per wire in switchbox (linear schemes = 3) this is pre-discovery of universal switchbox –input switch population 79-90% commensurate with M choose N structure

CMR->“Perfect” Switchboxes Universal was complete if unconstrained –Hybrid idea: build assuming one (or more) sides are unconstrained –get to use linear switches on these sides fully populate constrained sides tighten guaranteed routing bound

Greedy Routing Architecture

Constant Mapping Ratio Linear => CMR=1.5 Problem is using 3rd side, can build flat concentrator with –W 2 /2 switches –gives CMR=1 for tree routes

Finishing Greedy Perfect Route Tree Connections routed with: –2W+W 2 /2 switches Must route one remaining side as before –3W 2 Total –3.5W 2 +2W –(compare 6W 2 intro) Use Benes for constrained sides –O(Wlog(W))

Summary Even with limited switching schemes we’ve explored, full population appears untenable. Full population often more than really need. Can we define switching structures with “nice” routing properties, reasonable number of switches, and reasonable delay?

Summary (2) Empirical evidence that “linear” populations are adequate –routing challenge rises from lack of guarantees –better theory as to why adequate? –slight changes improve guarantees, ease of route?