CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz

Slides:



Advertisements
Similar presentations
CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con’t) March 16 th, 2011 John Kubiatowicz Electrical Engineering and Computer.
Advertisements

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Fundamentals of Computer Networks ECE 478/578 Lecture #13: Packet Switching (2) Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
1 Wide-Sense Nonblocking Multicast in a Class of Regular Optical Networks From: C. Zhou and Y. Yang, IEEE Transactions on communications, vol. 50, No.
Advanced Networking Wickus Nienaber Daniel Beech.
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.
CS 258 Parallel Computer Architecture Lecture 4 Network Topology and Routing February 4, 2008 Prof John D. Kubiatowicz
CS252 Graduate Computer Architecture Lecture 21 Multiprocessor Networks (con’t) John Kubiatowicz Electrical Engineering and Computer Sciences University.
Predictive Load Balancing Reconfigurable Computing Group.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CS252 Graduate Computer Architecture Lecture 15 Multiprocessor Networks (con’t) March 15 th, 2010 John Kubiatowicz Electrical Engineering and Computer.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Issues in System-Level Direct Networks Jason D. Bakos.
EECS 570: Fall rev1 1 Chapter 10: Scalable Interconnection Networks.
Interconnection Network Topology Design Trade-offs
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
John Kubiatowicz Electrical Engineering and Computer Sciences
Storage area network and System area network (SAN)
Switching, routing, and flow control in interconnection networks.
Interconnect Network Topologies
CS252 Graduate Computer Architecture Lecture 15 Multiprocessor Networks March 14 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences.
CS252 Graduate Computer Architecture Lecture 15 Multiprocessor Networks March 12 th, 2012 John Kubiatowicz Electrical Engineering and Computer Sciences.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Interconnect Networks
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Distributed Routing Algorithms. In a message passing distributed system, message passing is the only means of interprocessor communication. Unicast, Multicast,
Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
1 Scalable Interconnection Networks. 2 Scalable, High Performance Network At Core of Parallel Computer Architecture Requirements and trade-offs at many.
Computer Architecture Distributed Memory MIMD Architectures Ola Flygt Växjö University
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
ECE669 L21: Routing April 15, 2004 ECE 669 Parallel Computer Architecture Lecture 21 Routing.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Super computers Parallel Processing
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
John Kubiatowicz Electrical Engineering and Computer Sciences
Advanced Computer Networks
Multiprocessor Interconnection Networks Todd C
Deadlock.
Intra-Domain Routing Jacob Strauss September 14, 2006.
Interconnection Network Design Contd.
Cache Coherence and Interconnection Network Design
Switching, routing, and flow control in interconnection networks
Lecture 14: Interconnection Networks
Interconnection Network Design Lecture 14
Introduction to Scalable Interconnection Networks
Lecture: Interconnection Networks
CS 258 Parallel Computer Architecture Lecture 5 Routing (Con’t)
Networks: Routing and Design
Switching, routing, and flow control in interconnection networks
Presentation transcript:

CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz

Lec 5.2 2/6/08Kubiatowicz CS258 ©UCB Spring 2008 Recall: Embeddings in two dimensions When embedding higher-dimension in lower one, either some wires longer than others, or all wires long Note: for d>2, wiring density is nonuniform! 6 x 3 x 2

Lec 5.3 2/6/08Kubiatowicz CS258 ©UCB Spring 2008 Recall: In the 3D world For n nodes, bisection area is O(n 2/3 ) For large n, bisection bandwidth is limited to O(n 2/3 ) Paper: Performance Analysis of k-ary n-cube Interconnection Networks (here, n  dimension!) –Bill Dally, IEEE Transactions on Computers, June 1990 –Under assumption of constant wire bisection »Lower n is better because it provides: Lower latency, less contention, and higher throughput

Lec 5.4 2/6/08Kubiatowicz CS258 ©UCB Spring 2008 Dally paper (con’t) Equal Bisection,W=1 for hypercube  W= ½k Three wire models: –Constant delay, independent of length –Logarithmic delay with length (exponential driver tree) –Linear delay (speed of light/optimal repeaters) Logarithmic Delay Linear Daelay

Lec 5.5 2/6/08Kubiatowicz CS258 ©UCB Spring 2008 Problem: Low-dimensional networks have high k –Consequence: may have to travel many hops in single dimension –Routing latency can dominate long-distance traffic patterns Solution: Provide one or more “express” links –Like express trains, express elevators, etc »Delay linear with distance, lower constant »Closer to “speed of light” in medium »Lower power, since no router cost –“Express Cubes: Improving performance of k-ary n-cube interconnection networks,” Bill Dally 1991 Another Idea: route with pass transistors through links Another Idea: Express Cubes

Lec 5.6 2/6/08Kubiatowicz CS258 ©UCB Spring 2008 The Routing problem: Local decisions Routing at each hop: Pick next output port!

Lec 5.7 2/6/08Kubiatowicz CS258 ©UCB Spring 2008 Routing Recall: routing algorithm determines –which of the possible paths are used as routes –how the route is determined –R: N x N -> C, which at each switch maps the destination node n d to the next channel on the route Issues: –Routing mechanism »arithmetic »source-based port select »table driven »general computation –Properties of the routes –Deadlock free

Lec 5.8 2/6/08Kubiatowicz CS258 ©UCB Spring 2008 Routing Mechanism need to select output port for each input packet –in a few cycles Simple arithmetic in regular topologies –ex:  x,  y routing in a grid »west (-x)  x < 0 »east (+x)  x > 0 »south (-y)  x = 0,  y < 0 »north (+y)  x = 0,  y > 0 »processor  x = 0,  y = 0 Reduce relative address of each dimension in order –Dimension-order routing in k-ary d-cubes –e-cube routing in n-cube

Lec 5.9 2/6/08Kubiatowicz CS258 ©UCB Spring 2008 Deadlock Freedom How can it arise? –necessary conditions: »shared resource »incrementally allocated »non-preemptible –think of a channel as a shared resource that is acquired incrementally »source buffer then dest. buffer »channels along a route How do you avoid it? –constrain how channel resources are allocated –ex: dimension order How do you prove that a routing algorithm is deadlock free? –Show that channel dependency graph has no cycles!

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Proof Technique resources are logically associated with channels messages introduce dependences between resources as they move forward need to articulate the possible dependences that can arise between channels show that there are no cycles in Channel Dependence Graph –find a numbering of channel resources such that every legal route follows a monotonic sequence  no traffic pattern can lead to deadlock network need not be acyclic, just channel dependence graph

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Example: k-ary 2D array Thm: Dimension-ordered (x,y) routing is deadlock free Numbering –+x channel (i,y) -> (i+1,y) gets i –similarly for -x with 0 as most positive edge –+y channel (x,j) -> (x,j+1) gets N+j –similary for -y channels any routing sequence: x direction, turn, y direction is increasing

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Channel Dependence Graph

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Consider Trees Why is the obvious routing on X deadlock free? –butterfly? –tree? –fat tree? Any assumptions about routing mechanism? amount of buffering?

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Up*-Down* routing Given any bidirectional network Construct a spanning tree Number of the nodes increasing from leaves to roots UP increase node numbers Any Source -> Dest by UP*-DOWN* route –up edges, single turn, down edges –Proof of deadlock freedom? Performance? –Some numberings and routes much better than others –interacts with topology in strange ways

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Turn Restrictions in X,Y XY routing forbids 4 of 8 turns and leaves no room for adaptive routing Can you allow more turns and still be deadlock free?

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Minimal turn restrictions in 2D north-lastnegative first -x +x +y -y

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Example legal west-first routes Can route around failures or congestion Can combine turn restrictions with virtual channels

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 More examples: What about wormhole routing on a ring? Or: Unidirectional Torus of higher dimension?

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Deadlock free wormhole networks? Basic dimension order routing techniques don’t work for unidirectional k-ary d-cubes –only for k-ary d-arrays (bi-directional) Idea: add channels! –provide multiple “virtual channels” to break the dependence cycle –good for BW too! –Do not need to add links, or xbar, only buffer resources This adds nodes to the CDG, remove edges?

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Breaking deadlock with virtual channels

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Adaptive Routing R: C x N x  -> C Essential for fault tolerance –at least multipath Can improve utilization of the network Simple deterministic algorithms easily run into bad permutations fully/partially adaptive, minimal/non-minimal can introduce complexity or anomolies little adaptation goes a long way!

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Use of virtual channels for adaptation Want to route around hotspots/faults while avoiding deadlock “An adaptive and Fault Tolerant Wormhole Routing Strategy for k-ary n-cubes,” –Linder and Harden, 1991 –General technique for k-ary n-cubes »Requires: 2 n-1 virtual channels/lane!!! Alternative: Planar adaptive routing –Chien and Kim, 1995 –Divide dimensions into “planes”, »i.e. in 3-cube, use X-Y and Y-Z –Route planes adaptively in order: first X-Y, then Y-Z »Never go back to plane once have left it »Can’t leave plane until have routed lowest coordinate –Use Linder-Harden technique for series of 2-dim planes »Now, need only 3  number of planes virtual channels Alternative: two phase routing –Provide set of virtual channels that can be used arbitrarily for routing –When blocked, use unrelated virtual channels for dimension-order (deterministic) routing –Never progress from deterministic routing back to adaptive routing

Lec /6/08Kubiatowicz CS258 ©UCB Spring 2008 Summary Routing Algorithms restrict the set of routes within the topology –simple mechanism selects turn at each hop –arithmetic, selection, lookup Deadlock-free if channel dependence graph is acyclic –limit turns to eliminate dependences –add separate channel resources to break dependences –combination of topology, algorithm, and switch design Virtual Channels –Adds complexity to router –Can be used for performance –Can be used for deadlock avoidance Deterministic vs adaptive routing