Static and Dynamic Networks

Slides:



Advertisements
Similar presentations
Comparison Of Network On Chip Topologies Ahmet Salih BÜYÜKKAYHAN Fall.
Advertisements

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
April 10, HOL Blocking analysis based on: Broadband Integrated Networks by Mischa Schwartz.
CS252 Graduate Computer Architecture Lecture 21 Multiprocessor Networks (con’t) John Kubiatowicz Electrical Engineering and Computer Sciences University.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Interconnection Network Topology Design Trade-offs
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
Interconnect Network Topologies
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
Interconnect Networks
Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan CS258 S99.
ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Dynamic Interconnect Lecture 5. COEN Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
1 Dynamic Interconnection Networks Miodrag Bolic.
Switches and indirect networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0.
ECE669 L21: Routing April 15, 2004 ECE 669 Parallel Computer Architecture Lecture 21 Routing.
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Birds Eye View of Interconnection Networks
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Topologies.
scheduling for local-area networks”
Parallel Architecture
Interconnect Networks
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Dynamic connection system
Lecture 23: Interconnection Networks
Connection System Serve on mutual connection processors and memory .
Multiprocessor Interconnection Networks Todd C
Packet Forwarding.
Refer example 2.4on page 64 ACA(Kai Hwang) And refer another ppt attached for static scheduling example.
Physical constraints (1/2)
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Prof John D. Kubiatowicz
Packet Switching (basics)
Interconnection Network Routing, Topology Design Trade-offs
John Kubiatowicz Electrical Engineering and Computer Sciences
Interconnection Network Design Contd.
Introduction to Scalable Interconnection Network Design
Switching, routing, and flow control in interconnection networks
Lecture 14: Interconnection Networks
Packet Scheduling/Arbitration in Virtual Output Queues and Others
Indirect Networks or Dynamic Networks
Interconnection Network Design Lecture 14
L.N. Bhuyan Partly from Berkeley Notes
Introduction to Scalable Interconnection Networks
Lecture: Interconnection Networks
On-time Network On-chip
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Advanced Computer Architecture 5MD00 / 5Z032 Multi-Processing 2
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Interconnection Networks Contd.
Embedded Computer Architecture 5SAI0 Interconnection Networks
Lecture: Interconnection Networks
CS 6290 Many-core & Interconnect
Birds Eye View of Interconnection Networks
CS 258 Parallel Computer Architecture Lecture 5 Routing (Con’t)
Networks: Routing and Design
Presentation transcript:

Static and Dynamic Networks L.N. Bhuyan Partly from Berkeley Notes CS258 S99

Hypercubes Also called binary n-cubes. # of nodes = N = 2n. O(logN) Hops Good bisection BW Complexity Out degree is n = logN correct dimensions in order with random comm. 2 ports per processor 0-D 1-D 2-D 3-D 4-D 5-D ! 9/20/2018 CS258 S99

9/20/2018 CS258 S99

9/20/2018 CS258 S99

9/20/2018 CS258 S99

Toplology Summary All have some “bad permutations” Topology Degree Diameter Ave Dist Bisection D (D ave) @ P=1024 1D Array 2 N-1 N / 3 1 huge 1D Ring 2 N/2 N/4 2 2D Mesh 4 2 (N1/2 - 1) 2/3 N1/2 N1/2 63 (21) 2D Torus 4 N1/2 1/2 N1/2 2N1/2 32 (16) k-ary n-cube 2n nk/2 nk/4 nk/4 15 (7.5) @n=3 Hypercube n =log N n n/2 N/2 10 (5) All have some “bad permutations” many popular permutations are very bad for meshs (transpose) randomness in wiring or routing makes it hard to find a bad one! 9/20/2018 CS258 S99

How Many Dimensions? n = 2 or n = 3 n >= 4 Short wires, easy to build Many hops, low bisection bandwidth Requires traffic locality n >= 4 Harder to build, more wires, longer average length Fewer hops, better bisection bandwidth Can handle non-local traffic k-ary d-cubes provide a consistent framework for comparison N = kd scale dimension (d) or nodes per dimension (k) assume cut-through 9/20/2018 CS258 S99

Traditional Scaling: Latency(P) Assumes equal channel width independent of node count or dimension dominated by average distance 9/20/2018 CS258 S99

Average Distance but, equal channel width is not equal cost! ave dist = d (k-1)/2 but, equal channel width is not equal cost! Higher dimension => more channels 9/20/2018 CS258 S99

Latency under Contention Optimal packet size? Channel utilization? 9/20/2018 CS258 S99

L.N. Bhuyan Partly from Berkeley Notes Dynamic Networks L.N. Bhuyan Partly from Berkeley Notes CS258 S99

What is Dynamic Network Dynamic Network is the network that can connect any input to any output by enabling or disabling some switches in the network Examples: - Shared Bus: The bus arbiter connects a processor to a memory - Crossbar: Consists of a lot of switching elements, which can be enabled to connect many inputs to many outputs simultaneously - Multistage Network: Consists of several stages of switches that are enabled to get connections - The nodes in static networks (like Mesh) also consist of dynamic crossbars 9/20/2018 CS258 S99

Crossbar Switch Design Complexity O(N**2) for an NXN Crossbar – Why? See next page 9/20/2018 CS258 S99

How do you build a crossbar From Control N**2 switches => Cost O(N**2) Time taken by the arbiter = O(N**2) Multiplexors are controlled from controller 9/20/2018 CS258 S99

Crossbar Contd. An NXN Crossbar allows all N inputs to be connected simultaneously to all N outputs It allows all one-to-one mappings, called permutations. No. of permutations = N! When two or more inputs request the same output, only one of them is connected and others are either dropped or buffered When processors access memories through crossbar, this situation is called memory access conflicts Given p as the probability of request by a processor per cycle and assuming that each of N processors’ request is uniformly directed to all N memories, the average number of connections allowed per cycle, called Bandwidth (BW) is BW = N{1- (1-p/N)**N} – Derive this!!! 9/20/2018 CS258 S99

Input buffered swtich Independent routing logic per input - FSM Scheduler logic arbitrates each output - priority, FIFO, random Head-of-line blocking problem – The head packet in a buffer cannot depart because the output is busy with another packet. The second packet may be destined to an output that is free, but cannot depart due to blocking by the first packet => One solution is to create multiple input queues, one per output, called Virtual Output Queuing – adopted in most routers. Scheduler Design – How to ensure maximum simultaneous connections is a challenging research area. 9/20/2018 CS258 S99

Problems with Input-Buffered Switch FIFO Input buffers give rise to Head of the Line (HOL) problem Current routers employ a separate input queue for each output, called virtual output queue (VOQ) Then how to schedule the packets from different VOQ’s for transmission? 9/20/2018 CS258 S99

VOQ-based Input Buffered Switch 9/20/2018 CS258 S99 CS258 S99

Scheduling in Input Buffered Switch n independent arbitration problems? static priority, random, round-robin simplifications due to routing algorithm? general case is max bipartite matching – Iterative algorithms – iSLIP in Cisco 9/20/2018 CS258 S99

Iterative Matching– A 3-step Procedure *In Request stage, each input sends req to outputs for which it has cells for. *Grant stage, output chooses one from maybe several received request and sends a grant signal to one of the inputs *accept state. Each input send accept signal to only of the outputs offering grants. Request Grant Accept 9/20/2018 CS258 S99 CS258 S99

Output/Shared Buffered Switch RAM speed has to be N times the link speed. Output Buffered Switch has buffers at output to store packets. There is always a minimal transmitting buffer at the input. What happens if there are 2 or more packets to the same output at the same time. In order to capture both, the switch speed has to be N times that of link speed => Difficult to design. 9/20/2018 CS258 S99

Shared Buffer Switch: IBM SP Vulcan switch Many gigabit Ethernet switches use similar design without the cut-through 128 8-byte ‘chunks’ in central queue, LRU per output 9/20/2018 CS258 S99

SGI SPIDER: IEEE Micro Jan 1997 9/20/2018 CS258 S99

Multistage Interconnection Network A network consisting of multiple stages of crossbar switches has the following properties. NxN network for N=2n Consists of log2N stages of 2x2 switches Has N/2 2x2 switches per stage Cost O(N log n) instead of O(N2) for Crossbar For N= an, a MIN can be similarly designed with axa switches 9/20/2018 CS258 S99

Multistage interconnection networks 000 1 1 001 2 010 1 3 011 4 100 5 101 6 110 7 111 Omega Network Complexity O(Nlog2N) 9/20/2018 CS258 S99

Perfect Shuffle shuffle interconnection 000 000 000 000 =0 001 001 001 001 =1 010 010 010 010 =2 011 011 011 011 =3 100 100 100 100 =4 101 101 101 101 =5 110 110 110 110 =6 111 111 111 111 =7 (a) Perfect shuffle (b) Inverse perfect shuffle shuffle interconnection S(an-1 an-2 … a1 a0) = (an-2 an-3 … a0 an-1 ) 9/20/2018 CS258 S99

Omega Network Every stage of switches is preceded by a perfect shuffle interconnection S(an-1 an-2 … a1 a0) = (an-2 an-3 … a0 an-1 ) An input can be connected to a straight or exchange output in a 2x2 switch. E(an-1 an-2 … a1 a0) = (an-1 an-2 … a1 ā0) To route a message/packet in an Omega network, the destination tag which is binary equivalent of the destination is used, (dn-1 dn-2 … d1 d0). The ith bit di is used to control the routing at the ith stage counted from the right with 0 <= i <= n-1. If di = 0, the input is connected to the upper output. If di = 1, it is connected to the lower output. 9/20/2018 CS258 S99

Self Routing A processor generates a tag that is binary equivalent of the destination MSB controls the leftmost stage and the lsb controls the rightmost stage of the Omega network. A small controller inside the 2 x 2 switch senses this bit and enables the connection If bit ci = 0, the request is to the upper output; if it is 1, the request is to the lower output. Based on digit if switch size is greater than 2 Network conflict - Select Round Robin Less Bandwidth than crossbar, but more cost effective What about QoS? Future research 9/20/2018 CS258 S99

Theorem: The Omega network is self routing Let source be (sn-1sn-2 … s2 … s1s0) and destination be (dn-1dn-2 … d2 … d1d0). Before Stage 1, the source is switched to the position (sn-2sn-3 … s1 … s0sn-1) due to perfect shuffle connection. After Stage 1 it is switched to (sn-2sn-3 … s1 … s0dn-1) as per the (n-1)th of the destination. Before 2nd stage of the switches, the source is connected to (sn-3 … s0dn-1sn-2) as after 2nd stage it becomes (sn-3 … s0dn-1dn-2) If we continue like this for n stages, the source matches (dn-1dn-2 … di … d1d0) which is the destination. 9/20/2018 CS258 S99

Example: SP 8-port switch, 40 MB/s per link, 8-bit phit, 16-bit flit, single 40 MHz clock packet sw, cut-through, no virtual channel, source-based routing variable packet <= 255 bytes, 31 byte fifo per input, 7 bytes per output, 16 phit links 9/20/2018 CS258 S99

Summary Routing Algorithms restrict the set of routes within the topology simple mechanism selects turn at each hop arithmetic, selection, lookup Deadlock-free if channel dependence graph is acyclic limit turns to eliminate dependences add separate channel resources to break dependences combination of topology, algorithm, and switch design Deterministic vs. adaptive routing Switch design issues input/output/pooled buffering, routing logic, selection logic Flow control Real networks are a ‘package’ of design choices 9/20/2018 CS258 S99