Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan CS258 S99.

Slides:

Advertisements

Similar presentations

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.

Advertisements

What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.

Router Architecture : Building high-performance routers Ian Pratt

Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.

1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.

April 10, HOL Blocking analysis based on: Broadband Integrated Networks by Mischa Schwartz.

10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.

NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.

Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.

1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.

1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.

EE 122: Router Design Kevin Lai September 25, 2002.

1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.

Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.

Computer Networks Switching Professor Hui Zhang

Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.

High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

Networks-on-Chips (NoCs) Basics

CS 552 Computer Networks IP forwarding Fall 2005 Rich Martin (Slides from D. Culler and N. McKeown)

ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.

1 Copyright © Monash University ATM Switch Design Philip Branch Centre for Telecommunications and Information Engineering (CTIE) Monash University

QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.

High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.

Dynamic Interconnect Lecture 5. COEN Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

Switches and indirect networks Computer Architecture AMANO, Hideharu Textbook pp. ９２～１３ 0.

ECE669 L21: Routing April 15, 2004 ECE 669 Parallel Computer Architecture Lecture 21 Routing.

Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.

Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).

ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.

Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.

Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.

McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.

Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)

Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

1 CSE 5346 Spring Network Simulator Project.

Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.

Lecture Note on Switch Architectures. Function of Switch.

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.

Virtual-Channel Flow Control William J. Dally

Input buffered switches (1)

1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.

Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.

Chapter 3 Part 3 Switching and Bridging

Chapter 8 Switching Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Buffer Management and Arbiter in a Switch

Dynamic connection system

Lecture 23: Interconnection Networks

Packet Forwarding.

Physical constraints (1/2)

Addressing: Router Design

Azeddien M. Sllame, Amani Hasan Abdelkader

Lecture 23: Router Design

Chapter 3 Part 3 Switching and Bridging

Packet Switching (basics)

Static and Dynamic Networks

Multiprocessors Interconnection Networks

L.N. Bhuyan Partly from Berkeley Notes

EE 122: Lecture 7 Ion Stoica September 18, 2001.

Interconnection Networks Contd.

Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.

Chapter 3 Part 3 Switching and Bridging

CS 6290 Many-core & Interconnect

CS 258 Parallel Computer Architecture Lecture 5 Routing (Con’t)

Networks: Routing and Design

Presentation transcript:

Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan CS258 S99

What is Dynamic Network Dynamic Network is the network that can connect any input to any output by enabling or disabling some switches in the network Examples: - Shared Bus: The bus arbiter connects a processor to a memory - Crossbar: Consists of a lot of switching elements, which can be enabled to connect many inputs to many outputs simultaneously - Multistage Network: Consists of several stages of switches that are enabled to get connections - The nodes in static networks (like Mesh) also consist of dynamic crossbars 4/22/2017 CS258 S99

Crossbar Switch Design Complexity O(N**2) for an NXN Crossbar 4/22/2017 CS258 S99

How do you build a crossbar From Control N**2 switches => Cost O(N**2) Time taken by the arbiter = O(N**2) Multiplexors are controlled from the arbiter/controller/scheduler 4/22/2017 CS258 S99

Crossbar Contd. An NXN Crossbar allows all N inputs to be connected simultaneously to all N outputs It allows all one-to-one mappings, called permutations. No. of permutations = N! When two or more inputs request the same output, it is called CONFLICT. Only one of them is connected and others are either dropped or buffered When processors access memories through crossbar, this situation is called memory access conflicts Given p as the probability of request by a processor per cycle and assuming that each of N processors’ request is uniformly directed to all N memories, the average number of connections allowed per cycle, called Bandwidth (BW) is BW = N{1- (1-p/N)**N} – Derive this!!! 4/22/2017 CS258 S99

Input buffered swtich Independent routing logic per input Scheduler logic arbitrates each output - priority, FIFO, random Head-of-line blocking problem – The head packet in a buffer cannot depart because the output is busy with another packet. The second packet may be destined to an output that is free, but cannot depart due to blocking by the first packet => One solution is to create multiple input queues, one per output, called Virtual Output Queuing – adopted in most routers. Scheduler Design – How to ensure maximum simultaneous connections is a challenging research area. 4/22/2017 CS258 S99

Problems with Input-Buffered Switch FIFO Input buffers give rise to Head of the Line (HOL) problem Current routers employ a separate input queue for each output, called virtual output queue (VOQ) Then how to schedule the packets from different VOQ’s for transmission? 4/22/2017 CS258 S99

VOQ-based Input Buffered Switch 4/22/2017 CS258 S99 CS258 S99

Scheduling in Input Buffered Switch n independent arbitration problems? static priority, random, round-robin simplifications due to routing algorithm? general case is max bipartite matching – Iterative algorithms – iSLIP in Cisco 4/22/2017 CS258 S99

Output Buffered Switch How would you build a shared pool? 4/22/2017 CS258 S99

Output scheduling n independent arbitration problems? static priority, random, round-robin simplifications due to routing algorithm? general case is max bipartite matching 4/22/2017 CS258 S99

Multistage Interconnection Network (MIN) Crossbar switch is not scalable. How about a network consisting of multiple stages of small crossbar switches? Has the following properties. NxN network for N=2n Consists of log2N stages of 2x2 switches Has N/2 2x2 switches per stage Cost O(N log n) instead of O(N2) for Crossbar For N= an, a MIN can be similarly designed with axa switches 4/22/2017 CS258 S99

Multistage interconnection networks 000 1 1 001 2 010 1 3 011 4 100 5 101 6 110 7 111 Complexity: Omega Network Complexity O(Nlog2N) Self Routing: The source node generates a tag, which is binary equivalent Of the destination. At each switch, the corresponding tag bit is checked. If the bit is 0, the input is connected to the upper output. If it is 1, the Input is connected to the lower output. If both inputs have either 0 or 1, It is a switch conflict. One of them is connected. The other one is rejected or buffered at the switch (if it has buffer => buffered crossbar) 4/22/2017 CS258 S99

What is Shuffle? shuffle interconnection 000 000 000 000 =0 001 001 001 001 =1 010 010 010 010 =2 011 011 011 011 =3 100 100 100 100 =4 101 101 101 101 =5 110 110 110 110 =6 111 111 111 111 =7 (a) Perfect shuffle (b) Inverse perfect shuffle shuffle interconnection S(an-1 an-2 … a1 a0) = (an-2 an-3 … a0 an-1 ) 4/22/2017 CS258 S99

Omega Network Every stage of switches is preceded by a perfect shuffle interconnection S(an-1 an-2 … a1 a0) = (an-2 an-3 … a0 an-1 ) An input can be connected to a straight or exchange output in a 2x2 switch. E(an-1 an-2 … a1 a0) = (an-1 an-2 … a1 ā0) To route a message/packet in an Omega network, the destination tag which is binary equivalent of the destination is used, (dn-1 dn-2 … d1 d0). The ith bit di is used to control the routing at the ith stage counted from the right with 0 <= i <= n-1. If di = 0, the input is connected to the upper output. If di = 1, it is connected to the lower output. 4/22/2017 CS258 S99

Self Routing A processor generates a tag that is binary equivalent of the destination MSB controls the leftmost stage and the lsb controls the rightmost stage of the Omega network. A small controller inside the 2 x 2 switch senses this bit and enables the connection If bit ci = 0, the request is to the upper output; if it is 1, the request is to the lower output. Based on digit if switch size is greater than 2 Network conflict - Select Round Robin Less Bandwidth than crossbar, but more cost effective What about QoS? Future research 4/22/2017 CS258 S99

Theorem: The Omega network is self routing Let source be (sn-1sn-2 … s2 … s1s0) and destination be (dn-1dn-2 … d2 … d1d0). Before Stage 1, the source is switched to the position (sn-2sn-3 … s1 … s0sn-1) due to perfect shuffle connection. After Stage 1 it is switched to (sn-2sn-3 … s1 … s0dn-1) as per the (n-1)th of the destination. Before 2nd stage of the switches, the source is connected to (sn-3 … s0dn-1sn-2) as after 2nd stage it becomes (sn-3 … s0dn-1dn-2) If we continue like this for n stages, the source matches (dn-1dn-2 … di … d1d0) which is the destination. 4/22/2017 CS258 S99

Switch Size axa Let N = a**n The MIN will consist of n stages of axa crossbar switches with N/a switches per stage. The routing will be based on digit (a-1) <= I => 0 based on radix a Interconnection based on a-shuffle Home Work: Prove self routing based on radix a. Draw a 16x16 MIN based on 4x4 switches and explain its operation Derive the BW of an Omega network with N=a**n with same input parameters as Crossbar (Slide 5) 4/22/2017 CS258 S99

Example: SP 8-port switch, 40 MB/s per link, 8-bit phit, 16-bit flit, single 40 MHz clock packet sw, cut-through, no virtual channel, source-based routing variable packet <= 255 bytes, 31 byte fifo per input, 7 bytes per output, 16 phit links 4/22/2017 CS258 S99

Example: IBM SP vulcan switch Many gigabit ethernet switches use similar design without the cut-through 4/22/2017 CS258 S99

SGI SPIDER Chip 4/22/2017 CS258 S99

SPIDER OPERATION The physical transmission layer for each port is based on a pair of Source Synchronous Drivers and Receivers (SSD and SSR), which transmit and receive 20 data bits and a data framing signal at 400 MBaud. The data link level guarantees reliable transmission using a CCITT-CRC code with a go-back-n sliding window protocol [1] retry mechanism, and is referred to as the Link Level Protocol (LLP). The message layer defines 4 virtual channels and a credit based flow control scheme to support arbitrary message lengths, as well as a header format to specify message destination, priority, and congestion control options. The receive buffers of a port maintain a separate linked list of messages for each of the 5 possible output ports for each virtual channel to avoid the ‘block at head of queue’ bottleneck. 4/22/2017 CS258 S99

SPIDER Crossbar Arbitration To maximize bandwidth through the crossbar without using unreasonable buffering, each virtual channel buffer is organized as a set of linked lists. There is one linked list for each possible output port for each virtual channel. This solution avoids the block at head of queue problem. To maximize crossbar efficiency, each virtual channel from each port can request arbitration for every possible destination. Each arbitration cycle, the arbiter chooses up to 6 winners from as many as 120 arbitration candidates to maximize crossbar utilization. Messages accumulate a network age as they are routed, increasing their priority to avoid starvation and promote network fairness. In order to avoid starvation and encourage network fairness, the arbiter is rotated each arbitration cycle to favor the highest priority requestor. Priority is based on the age field of a message header. 4/22/2017 CS258 S99

Arbitration Contd. After data is received by the SSR and synchronized, it enters the chip core and begins several operations in parallel. Table lookup and crossbar arbitration is normally serialized, as the exit port must be known before arbitration begins. To parallelize these operations, table lookup is pipelined across SPIDER chips. While arbitration progresses. the table lookup is performed for the next SPIDER chip, which depends on the destination ID and the direction field. This does increase table size, as a full table is required for each neighboring SPIDER chip, but it reduces latency by a full clock. Pipelined tables also add flexibility to possible routes, as different exit ports can be given depending on where a messages came from as well as where it is going. 4/22/2017 CS258 S99

Summary Routing Algorithms restrict the set of routes within the topology simple mechanism selects turn at each hop arithmetic, selection, lookup Deadlock-free if channel dependence graph is acyclic limit turns to eliminate dependences add separate channel resources to break dependences combination of topology, algorithm, and switch design Deterministic vs. adaptive routing Switch design issues input/output/pooled buffering, routing logic, selection logic Flow control Real networks are a ‘package’ of design choices 4/22/2017 CS258 S99