Interconnect Networks

Slides:



Advertisements
Similar presentations
Comparison Of Network On Chip Topologies Ahmet Salih BÜYÜKKAYHAN Fall.
Advertisements

Shantanu Dutt Univ. of Illinois at Chicago
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Interconnection Network Topology Design Trade-offs
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
CS252/Patterson Lec /28/01 CS162 Computer Architecture Lecture 16: Multiprocessor 2: Directory Protocol, Interconnection Networks.
1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
Storage area network and System area network (SAN)
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Topologies.
Switching, routing, and flow control in interconnection networks.
Interconnect Network Topologies
CS252 Graduate Computer Architecture Lecture 15 Multiprocessor Networks March 14 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences.
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Computer Science Department
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks.
PPC Spring Interconnection Networks1 CSCI-4320/6360: Parallel Programming & Computing (PPC) Interconnection Networks Prof. Chris Carothers Computer.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
1 Lecture 7: Interconnection Network Part I: Basic Definitions Part II: Message Passing Multicomputers.
Dynamic Interconnect Lecture 5. COEN Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
1 Dynamic Interconnection Networks Miodrag Bolic.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
Switches and indirect networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0.
Lecture 3 Innerconnection Networks for Parallel Computers
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.
Shanghai Jiao Tong University 2012 Indirect Networks or Dynamic Networks Guihai Chen …with major presentation contribution from José Flich, UPV (and Cell.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Birds Eye View of Interconnection Networks
1 Interconnection Networks. 2 Interconnection Networks Interconnection Network (for SIMD/MIMD) can be used for internal connections among: Processors,
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Super computers Parallel Processing
INTERCONNECTION NETWORKS Work done as part of Parallel Architecture Under the guidance of Dr. Edwin Sha By Gomathy Gowri Narayanan Karthik Alagu Dynamic.
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Interconnection Networks Communications Among Processors.
INTERCONNECTION NETWORK
Topologies.
Parallel Architecture
Interconnect Networks
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Lecture 23: Interconnection Networks
Connection System Serve on mutual connection processors and memory .
Multiprocessor Interconnection Networks Todd C
Interconnection topologies
Prof John D. Kubiatowicz
Interconnection Network Routing, Topology Design Trade-offs
John Kubiatowicz Electrical Engineering and Computer Sciences
Introduction to Scalable Interconnection Network Design
Indirect Networks or Dynamic Networks
Interconnection Network Design Lecture 14
Static Interconnection Networks
Interconnection Networks Contd.
Embedded Computer Architecture 5SAI0 Interconnection Networks
Static Interconnection Networks
Presentation transcript:

Interconnect Networks

Generic scalable multiprocessor architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters of servers) Network characteristics: bandwidth and latency

Scalable interconnection network At the core of parallel computer architecture Requirements and trade-offs at many levels Still little consensus at this time Interactions across levels (e.g. network level optimizations may conflict with messageing level optimizations). Workload Performance metrics Need holistic understanding

Network components Network interface (card) Link Switches Communication between a node and the network Link Bundle of wires and fibers that carry signals Switches Connects a fixed number of input channels to a fixed number of output channels. In this community, switches may also have the router functions.

Switch The cross-bar can realize a communication from any input port to any output port.

Cross-bar functionality – all permutations can be realized simultaneously 1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 4 1 2 3 4 1 2 3 4 output (1,2,3,4)-> (4,3,2,1) (1,2, 3, 4)-> (3, 1, 2, 4) A 4x4 cross-bar Permutation: (1, 2, 3, 4) -> (3, 1, 2, 4) A communication pattern where each source happens once, each destination happens once.

Switch example: 24-port 1Gbps Ethernet switch 24 input ports and 24 output ports – each Ethernet jacket has one input port and one output port. All 24 machines can send and receive simultaneously. switch Ethernet card machine

Alternatives to cross-bars A question: why buffers when we can always do permutation? An N x N cross bar has O(N^2) cross points (on/off switches). Not scalable, expensive An alternative for low end switches: bus and memory When bus and memory is fast enough, moving data between input and output ports are like memory copy in a typical computer.

Bus and memory alternative to crossbar Realizing (1, 2, 3, 4) -> (4, 3, 2, 1) Read from input port 1 to memory A Read from input port 2 to memory B Read from input port 3 to memory C Read from input port 4 to memory D Run forwarding logic (find out the output ports) Write A to output port 4 Write B to output port 3 Write C to output port 2 Write D to output port 1

Bus and memory alternative to crossbar A typical northbridge bandwidth is a few GBps. Let us assume the bandwidth is 4GBps, how many ports can the northbridge support in 100Mbps Ethernet swithes? This is why it can only used in low end switches!

Another alternative: multistage interconnection network Realize all permutations without controlling O(N^2) cross-points. Clos networks, Benes networks

Characteristics of a network Topology (what) Physical interconnection structure of the network graph. Physically limits the performance of the networks. Routing algorithm (which) Restricts the set of paths that messages can follow. Switching strategy (how) How data in a message traverses a route (passing routers) Flow control mechanism (when) When a message or portions of it traverse a route What happens when traffic encountered

Topology How the components are connected. Important properties Diameter: maximum distance between any two nodes in the network (hop count, or # of links). Nodal degree: how many links connect to each node. Bisection bandwidth: The smallest bandwidth between half of the nodes to another half of the nodes. A good topology: small diameter, small nodal degree, large bisection bandwidth.

Topology Regular topologies Irregular topologies Nodes are connected with some kind of patterns. The graph has a structure. Nodes are identified by coordinates. Routing can usually pre-determined by the coordinates of the nodes. Irregular topologies Nodes are connected arbitrarily. The graph does not have a structure, e.g. internet More extensible in comparison to regular topology. Usually use variations of shortest path routing.

Linear Arrays and Rings Ring (torus) Short wire torus Diameter = ?, nodal = ? Bisection bandwidth = ?

Describing linear array and ring Array: nodes are numbered from 0, 1, …, N-1 Node i is connected to node i+1, 0<=i<=N-2 Ring: nodes are numbered from 0, 1, …, N-1 Node I is connected to node (i+1) mod N, for all 0<=i<=N-1

Multidimensional Meshes and Tori d-dimensional array/torus N = k_{d-1} x k_{d-2} x … x d_0 Each node is described by a d-vector of coordinate Node (i_{d-1} x i_{d-2} x …x d_0) is connected to ???

More about multi-dimensional mesh and tori d-dimension k-ary mesh (torus) Each node is described by a d-vector of coordinates. The value of each item in the vector is between 0 and d_i-1. Diameter = ? Nodal degree = ? Bisection bandwidth = ?

Hypercubes Also call binary n-cubes. # of nodes = N = 2^n Each node is described by its binary representation. There is a link between two nodes whose binary representations differ by one bit. Diameter=? Nodal degree = ? Bisection bandwidth = ?

K-ary n-cube (n-dimensional, k-ary mesh/torus) Extended from binary (hypercube) to k-ary Each dimension has k elements, n dimensions Each node is identified by a k-based number (n digits). Dimension order routing 4-ary 0-cube 4-ary 1-cube 4-ary 2-cube 4-ary 3-cube

Trees Fixed degree, log(N) diameter, O(1) bisection bandwidth. Routing: up to the common ancestor than go down.

Irregular topology Irregular topology does not any special mathmetic properties Can be expanded in any way. No easy way for routing: routes need to be computed like in the Internet. Routes can usually be determined in a regular network by using the coordinates of the source and destination.

Direct and indirect networks All the previously discussed networks are direct networks in that the compute nodes are directly attached to the nodes in the topology. An example mesh system. Each switch is a 5x5 switch

Indirect networks Compute nodes are not directly attached to each switch, but are rather attached to the whole network. Using a central interconnect to connect all compute nodes The network emulate the cross-bar switch functionality.

Fully connected network Different organizations: Connected by one switch (crossbar switch), connecting all nodes, connected with a crossbar. All permutation communication (each node sends one message and receives one message) can be realized.

Multistage network Try to emulate the cross-bar connection. Realizing permutation without blocking Using smaller cross-bar(2x2, 4x4) switches as the building block. Usually O(Nlg(N)) switches (lg(N) stages.

Multi-stage networks examples (a) An 8-input butterfly network (b) An 8-input Benes network Butterfly network is blocking. There exist some permutation that results in link contention. Benes network is non-blocking. If the permutation is known a prior, it can always be realized without link contention.

Clos Network Three stages: ingress stage, middle stage, and egress stage Ingress/egress stage has r n X m switches Middle stage has m r X r switches Each switch at ingress/egress stage connects to all m middle switches (one port to each switch).

Clos Network Clos network is non-blocking when m>=2n-1.

Fat-Trees Fatter links (really more of them) as you go up, so bisection BW scales with N Not practical, root is an NxN switch

Practical Fat-trees Use smaller switches to approximate large switches. Connectivity is reduced, but the topology is not implementable Most commodity large clusters use this topology. Also call constant bisection bandwidth network (CBB)

Clos network and fat-tree (folded Clos) A generic 2-level fat-tree (folded Clos) A generic 3-stage Clos network

Physical constraint on topologies Number of dimensions. 2 or 3 dimensions Can be layout physically Short wires, easy to build Many hops, low bisection bandwidth >=4 dimensions Harder to build, longer wires Fewer hops, better bisection bandwidth K-ary n-cubes provide a good framework for comparison.