Switches and indirect networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0.

Slides:



Advertisements
Similar presentations
Shantanu Dutt Univ. of Illinois at Chicago
Advertisements

ATM Switch Architectures
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Multiple Processor Systems
NC 論2 (No.2) 1 Indirect (dynamic) Networks Communication between any two nodes has to be carried through some switches. Classified into: –Crossbar network.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
1 Omega Network The omega network is another example of a banyan multistage interconnection network that can be used as a switch fabric The omega differs.
1 Delta Network The delta network is one example of a multistage interconnection network that can be used as a switch fabric The delta network is an example.
1 CSE 591-S04 (lect 14) Interconnection Networks (notes by Ken Ryu of Arizona State) l Measure –How quickly it can deliver how much of what’s needed to.
1 Performance Results The following are some graphical performance results out of the literature for different ATM switch designs and configurations For.
NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.
Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.
1 Interface Circuits Homepage Address: Course Manuscripts Homework Evaluation.

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 7 Multiprocessors and Multicomputers 7.1 Multiprocessor System Interconnects.
Parallel Computer Architectures
Interconnection Networks in Multiprocessor Systems By: Wallun Chan Course: CS 147 Text: Chapter 12, p Professor: Sin-Min Lee.
Interconnect Network Topologies
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
Interconnect Networks
MIMD Shared Memory Multiprocessors. MIMD -- Shared Memory u Each processor has a full CPU u Each processors runs its own code –can be the same program.
Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan CS258 S99.
ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.
1 Copyright © Monash University ATM Switch Design Philip Branch Centre for Telecommunications and Information Engineering (CTIE) Monash University
Centralized (Indirect) switching networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Dynamic Interconnect Lecture 5. COEN Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.
Parallel Computer Architecture and Interconnect 1b.1.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
1 Dynamic Interconnection Networks Miodrag Bolic.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
Centralized switching networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0.
شبکه های میان ارتباطی 1 به نام خدا دکتر محمد کاظم اکبری مرتضی سرگلزایی جوان
Centralized switching networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Shanghai Jiao Tong University 2012 Indirect Networks or Dynamic Networks Guihai Chen …with major presentation contribution from José Flich, UPV (and Cell.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Sun Starfire: Extending the SMP Envelope Presented by Jen Miller 2/9/2004.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Ch 8. Switching. Switch  Devices that interconnected with each other  Connecting all nodes (like mesh network) is not cost-effective  Some topology.
Based on An Engineering Approach to Computer Networking/ Keshav
MASCON: A Single IC Solution to ATM Multi-Channel Switching With Embedded Multicasting Ali Mohammad Zareh Bidoki April 2002.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 10.
INTERCONNECTION NETWORKS Work done as part of Parallel Architecture Under the guidance of Dr. Edwin Sha By Gomathy Gowri Narayanan Karthik Alagu Dynamic.
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
Spring 2000CS 4611 Router Construction Outline Switched Fabrics IP Routers Extensible (Active) Routers.
1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 26 – Alternative Architectures.
INTERCONNECTION NETWORK
Overview Parallel Processing Pipelining
Parallel Architecture
Interconnect Networks
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Dynamic connection system
Lecture 23: Interconnection Networks
Refer example 2.4on page 64 ACA(Kai Hwang) And refer another ppt attached for static scheduling example.
Overview Parallel Processing Pipelining
Packet Switching (basics)
Parallel and Multiprocessor Architectures
Multiprocessors Interconnection Networks
Indirect Networks or Dynamic Networks
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Advanced Computer and Parallel Processing
Delta Network The delta network is one example of a multistage interconnection network that can be used as a switch fabric The delta network is an example.
Dynamic Interconnection Networks
Advanced Computer and Parallel Processing
Database System Architectures
Presentation transcript:

Switches and indirect networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0

Switch connected parallel machines Simple extension of bus connected machines  PU-Memory connection: UMA  Node-node connection: NUMA, NORA  Snoop is impossible  Directory based methods or compiler assisted methods are used for UMA/NUMA How to build large scale systems

Switch connected UMA Switch Local Memory CPU Interface Main Memory ........ ….…. Local Memory is sometimes dispensable

Switch connected UMA Blocking Switch Local Memory CPU Interface Main Memory n ........ 1 ….…. 0 1 n 0 Shared Memory

Switch connected UMA Interleaving Switch Local Memory CPU Interface Main Memory n ........ 1 ….…. 0 Shared Memory ….…. ….…. Size: Double word or Cache Line

Switch connected UMA with circular connection Switch CPU Interface Main Memory ........ ….…. Main memory is used as a home memory Interleave is often difficult

Switch connected NUMA Switching Fabrics … Symmetric Multi-Processor Switching Fabrics sometimes become hierarchical structure → Fat Tree Directory based Cache coherent methods are used → CC-NUMA Typical recent high performance server: SUN’s or IBM’s

Switch based network Single stage  Crossbar Multi-stage  Symmetric: Multistage Interconnection Network  Asymmetric: Fat-tree, base-m n-cube → Direct interconnection network

Crossbar switch n m Cross point: small switching element The number of cross points: nxm Extension of the buses

Non-blocking property n m For different destination, conflict free

Head Of Line (HOL) conflict n m X Arbiter is required for each bus The buffer is required The number of cross point is not dominant.

Input buffer switch Crossbar Input buffer One of conflicting packets is selected. Others are stored Into the input buffer.

Merit/demerit of Crossbars Non-blocking property Simple structure/Control The hardware for cross-points usually do not limit the system (Fallacy of crossbars) Extension is difficult by the pin-limitation of LSIs  If pins can be used, a large crossbar can be constructed → Earth simulator

Earth Simulator (2002,NEC) Vector Processor … 017 Shared Memory 16GB Vector Processor … 017 Shared Memory 16GB Vector Processor … 017 Shared Memory 16GB …. 639 Inputs crossbar (16GB/s x 2) Node 0 Node 1Node 639 Peak performance 40TFLOPS

MIN ( Multistage Interconnection Network) Multistage connected switching elements form a large switch. Symmetric Smaller number of cross-points, high degree of expandability Bandwidth is often degraded Latency is stretched

Classification of MIN Blocking network : Conflict may occur for destination is different :NlogN type standard MIN,πnetwork, Re-arrangeable : Conflict free scheduling is possible : Benes network 、 Clos network ( rearrangeable configuration ) Non-blocking : Conflict free without scheduling : Clos network (non-blocking configuration) 、 Batcher-Banyan network

Properties of MIN Throughput for random communication Permutation capability Partition capability F ault torelance Routing

Blocking Networks Standard NlogN networks  Omega network  Generalized Cube  Baseline Pass through ratio (throughput) is the same. Π network

Omega network The number of switching element (2x2, in this case ) is 1/2 N x LogN

Perfect Shuffle Rotate to left 000 001 010 011 100 101 110 111 000 010 100 110 001 011 101 111 Inverse Shuffle Rotate to right

Destination Routing Check the destination tag from MSB If 0 use upper link, else use lower link. 1→ 3 0 1 1 5→65→6 1 1 0

Blocking Property For different destination, multiple paths conflict X 0→00→0 4→24→2

For using large switching elements ( Delta network ) In the current art of technology, 8x8 (4x4) crossbars are advantageous. 00 01 10 11 20 21 30 31 00 01 10 11 20 21 30 31 01230123 01230123 1 2 Shuffle connection is also used.

Omega network The same connection is used for all stages. destination routing A lot of useful permutations are available. Problems on partitioning and expandability.

Generalized Cube Links labeled with 1bit distance are connected to the same switching element

Routing in Generalized Cube The source label and destination label is compared (Ex-Or ): Same(0) : Straight Different (1) : Exchange 001→

Partitioning The communication in the upper half never disturbs the lower half.

Expandability A size of network can be used as an element of larger size networks

Generalized Cube Destination routing cannot be applied. The routing tag is generated by exclusive or of source label and destination label. Partitioning Expandability

Baseline Network The area of shuffling is changed bit shuffle2bit shuffle

Destination Routing in Baseline network Just like Omega network 1 1 0

Partitioning in Baseline

Baseline network Providing both benefits of Omega and Generalized Cube  Destination Routing  Partitioning  Expandability Used in NEC’s Cenju

Π network Tandem connection of two Omega networks

Bit reversal permutation (Used in FFT) Conflicts occur in Omega network 0426153704261537 01230123 45674567

Bit reversal permutation in Π network 0426153704261537 0527143605271436 The first Omega : Upper input has priority. The next Omega : Destination Routing Conflict free

Permutation capacity All possible permutation is conflict free = Rearrangeable networks Tree tandem connection of Omega network is rearrangeable. The tandem connection of Omega and Inverse Omega (Baseline and Inverse Baseline) is rearrangeable. Benes network

Benes Network Note that the center of stage is shared. The rearrangeable network with the smallest hardware requirement

Non-blocking network Clos network  m>n1+n2-1 : Non-blocking  m>=n2 : Rearrangeable  Else: Blocking

Clos network... n1xm r1xr2mxn2 r1 m r2r2 m=n1+n2-1 : Non-blocking m=n2 : Rearrangeable m<n2 : Blocking The number of intermediate stage dominates the permutation capability. 3-stage

Batcher network 5704213657042136 5740126357401263 0457632104576321 0123456701234567 Bitonic sorting network

Batcher-Banyan 5704213657042136 5740126357401263 0457632104576321 0123456701234567 Sorted input is conflict free in the banyan network Omega Baseline

Banyan networks Only a path is provided between source and destination. The number of intermediate stages is flexible. Approach from graph theory SW-Banyan , CC-Banyan , Barrel Shifter Irregular structure is allowed.

Batcher-banyan If there are multiple packets to the same destination, the conflict free condition is broken → The other packets may conflict.  The extension of banyan network is required. The number of stages is large. → Large pass through time  The structure of sorting network is simple.

Classification of MINs Omega Baseline Generalized Cube π Benes Clos Batcher Banyan Blocking Rearrageble Nonblocking

Fault tolerant MINs Multiple paths Redundant structure is required. On-the-fly fault recovery is difficult. Improving chip yield.

Extra Stage Cube (ESC) An extra stage + Bypass mechanism X If there is a fault on stages or links, another path is used.

The buffer in switching element Conflicting packets are stored into buffers

Hot spot contention Buffer is saturated in the figure of t ree状 ( Tree Saturation) Hot spot

Relaxing the hot spot contention Wormhole routing with Virtual channels → Direct network Message Combining  Multiple packets are combining to a packet inside a switching element (IBM RP3)  Implementation is difficult (Implemented in SNAIL)

Other issues in MINs MIN with cache control mechanism  Directory on MIN  Cache Controller on MIN MINs with U-turn path → Fat tree

Exercise Every path between source and destination is determined with the destination routing in Omega network. Prove (or explain) the above theory in Omega network with 8-input/output.