Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks.

Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks

Winter 2006 Introduction When considering interconnection networks for parallel computation there are many shared concepts with LANs (Local Area Networks) and WANs (Wide Area Networks). Interconnection Networks for parallel computing is a wide and interesting field. There are many areas of theoretical and practical research with regards to this topic. “Parallel Computer Architecture”;Ch10, Culler, Singh.

Winter 2006 Introduction There are different ways at looking at this topic: Firstly, the interconnection structure often has mathematical properties that often reflect the communication patterns of important algorithms (a regular structure). Secondly, the design of the physical link between asynchronous elements is a huge area of research. Thirdly, competition between shared resources within a network is also a large area of research.

Winter 2006 Basic Definitions Ch 10.1 Communication Assist

Winter 2006 Basic Definitions Some terms: CA: Communications Assist NI: Network Interface Mem: Memory P: Processor Communication Requirements for this generic view: The interconnection network will need to provide network transactions that support the programming model. Latency should be minimized. Adequate concurrent transactions must be supported.

Winter 2006 Basic Definitions Physical Protocol: Converts analog signals into digital ones. Link Protocol: is responsible for grouping symbols into packets. Node Level Protocol: is responsible for attaching information so that the target CA can accomplish the transfer.

Winter 2006 Basic Definitions We can look at an IN as a graph that contains vertices (processing hosts or switch elements) and channels between vertices. Channels have the following properties: Width w (in bits) Signaling Rate f = 1 / T

Winter 2006 Basic Definitions Channel Bandwidth b = w * f The amount of data transferred across a link in one cycle is called a physical unit or phit. Switches connect input channels to output channels. The number of channels connected is called the switch degree.

Winter 2006 Network Components The following components make up a network: Topology: The structure of the network graph. 2D grid, 3D cube, irregular etc. A direct network has a host (processing element) connected to each switch. An indirect network will have hosts connected to a subset of available switches. The hosts will then be on the edge of the network graph.

Winter 2006 Network Components Routing Algorithm: The path that messages make through the network is called a route. Procedure describing which route each message takes is called the routing algorithm. Switching Strategy: How a message travels its route. Circuit Switching: The same route is used until the entire message is transferred by establishing an end-to-end connection. The route can be reversed as well. Packet Switching: Message is broken into packets with its own routing information. Each packet can be individually routed. Requires routing tag overhead.

Winter 2006 Network Components Flow Control Mechanism: Controls when a message or parts of it, move along its route. Flow control becomes a necessity when a network resource has to be utilized by multiple messages at the same time. Flow Control Options: Stalled in place. Buffered. Re-routed. Discarded.

Winter 2006 Network Components The largest unit of information that can be accepted or rejected by the nodes in a network is called a flit. How big can a flit be? ANS: It can be as big as the entire message or packet. It can be as small as a phit. Some other terms that can be used when talking of networks for parallel processing are: Diameter: the maximum length of the shortest path between two nodes through a network. Routing Distance: number of links between source and destination. Average Distance: average of routing distance.

Winter 2006 Packet Formatting

Winter 2006 Packet Format Header: contains routing and control info so that the switches can interpret what to do when the packet arrives. Payload: The information contained within the packet. Trailer: Usually contains an error checking code.

Winter 2006 Packet Format In parallel processing networks, like LANs, WANs and the Internet we have issues of Encapsulation and Fragmentation. Encapsulation involves carrying info from a higher level of abstraction within the current layer. Fragmentation involves splitting up the higher level information into a sequence of messages.

Winter 2006 Communication Performance There are four components that affect the time to transfer n bits from source to destination: Time S-D (n) = Overhead + Routing Delay + Channel Occupancy + Contention Delay

Winter 2006 Performance Overhead: comes from getting the message in and out of the network (ie can be caused by the CA) Channel Occupancy: gives us a lower bound on latency. Channel occupancy could be simply viewed as the time taken for the message to get from source to destination on a direct link. The CA takes time to process the communication request. Each channel traveled by the packet encounters delay. The destination CA takes some time to process the packet.

Winter 2006 Performance Overall the occupancy of the channel can be determined by: (n + n e ) / b n = number if bits in payload n e = number of bits in header and trailer b = channel bandwidth We can also look at the effective bandwidth: n / ( n + n e )

Winter 2006 Performance Routing Delay: Each channel in the route incurs a little delay that builds up (we will consider the time taken for node to switch interface to be part of the routing delay). Causes of routing delay: Routing distance (h): number of channels used in route. Switching delay (Δ): time taken for a switch to select the proper output port. h depends on network topology, routing algorithm used and specific nodes involved in the transaction.

Winter 2006 Unloaded Latency (based on switching strategy) Store and Forward Routing (packet switched) : The entire packet is received by the switch before forwarded to the next channel. Latency: Number of bits in packet, including header and trailer.

Winter 2006 Unloaded Latency (based on switching strategy) Circuit switched Once the route is setup it is maintained. We therefore only encounter the switching latency when the route is setup.

Winter 2006 Unloaded Latency (based on switching strategy) In the case of circuit switching we can note the following: As the message size increases, the amount of latency caused by route setup per hop (h * Δ), and hence the topology becomes insignificant. How can we reduce the latency in the case of store and forward packet switching?

Winter 2006 Unloaded Latency (based on switching strategy) Solution: Fragment the message packet into smaller packets. The smaller packets flow through in a pipelined fashion. The unloaded latency becomes: Size of fragments. Same form as before.

Winter 2006 The previous example is commonly used in the internet and larger networks. In the case of parallel processing cut-through routing can be used: Once a few phits are received by the switch the rest of packet is routed to the output. Unloaded Latency (based on switching strategy) This value will be different from the circuit switched case.

Winter 2006 Unloaded Latency (Based on Switching Strategy)

Winter 2006 Contention As in traditional networking, contention will occur then two incoming messages need to be routed to the same output at the same time. In store and forward routing the switch will buffer an entire packet. If there is contention, one packet will get switched to the output and one will get blocked until the next switching cycle.

Winter 2006 Contention In circuit switching, usually some type of probe is sent from the source node to the destination. If there is contention, the probe will be resent after some time later. Cut through routing can handle contention in two ways: Virtual cut-through, route one of the packets into a buffer then route in the next switch cycle. This has the same penalty as store and forward routing under contention.

Winter 2006 Contention Wormhole, only a few flits are buffered from the header of the packet, then the tail portion is maintained. Similar to holding the circuit open from the sender’s point of view.

Winter 2006 Bandwidth By just looking at the bandwidth from the viewpoint of a single node, the channel has a channel bandwidth that is higher than the bandwidth that the node can send useful data on. b eff =

Winter 2006 Bandwidth Taking into account routing delay at the switch (Δ) we have the following expression: b eff = w is included here in case the channel width in more than a bit. Note that n will be the size in phits.

Winter 2006 Bandwidth These expressions are useful for looking at the bandwidth available to one node. What if we want a measure of the overall bandwidth in a network? Most common measure is the bisection bandwidth: The sum of the bandwidths of the minimum set of channels that, if removed, partition the network into two equal unconnected sets of nodes. This has a nice property when considering a uniform communication pattern, what is it?

Winter 2006 Bandwidth ANSWER: Half of the messages are expected to cross the bisection in each direction. With this in mind what is wrong with this notion of global or “aggregate” bandwidth available? ANSWER: If communication is localized, then the bisection bandwidth will give a lower value for communication time.

Winter 2006 Total Bandwidth and Average Link Utilization Total Bandwidth = C * b (bytes/sec) = C * w (bits / cycle) = C (phits / cycle) Assuming each of N hosts issue a packet every M cycles with average routing distance h. Then each packet occupies h channels for l = n / w cycles. The total load on the network is (N * h * l ) / M) phits/cycle

Winter 2006 Total Bandwidth and Average Link Utilization The average link utilization is (<1): This is discussed on P. 762 of the text.

Winter 2006 Bandwidth The number of links or channels per node (C /N) is the total communication bandwidth (phits/cycle/node). This is consumed in direct proportion to the message size and to the routing distance.

Winter 2006 Factors That Limit ρ Before we look at why the link utilization is less than one (in some cases much less than one) let us consider the various properties on the network. The number of links per network node is a property of topology.

Winter 2006 Factors That Limit ρ Average routing distance depends on: The topology Routing algorithm Program communication pattern Mapping of program onto machine Often good communication locality will provide a small h, random communication will give the average routing distance and a bad pattern will cross the entire diameter.

Winter 2006 Factors That Limit ρ Factors: Communication may not be balanced over all links. Even if it is balanced, the routing algorithm may not support the communication pattern of the program. Contention for other networking resources may arise. These factors affect the saturation point of the network.

Winter 2006 What assumption is being made here?

Winter 2006 Topology of INs Before we discuss some different types of interconnection network topologies we want to consider the following: The number of host nodes that is connected to the network will be defined to be N. Characteristics of each topology will be discussed as a function of N.

Winter 2006 Fully Connected Network This type of network connects all inputs to all outputs. It can be considered a single big switch. The diameter of such a network is: 1. The degree in N. Unfortunately, if there is a hardware failure in such a network the entire network goes down, or at least full connectivity is lost.

Winter 2006 Fully Connected Network A bus is an example of a fully connected network. Its cost scales with O(N) Bandwidth: Total Bandwidth = O(1) Bisection Bandwidth = O(1) Bandwidth scaling is worse than O(1) as clock rate reduces with #ports due to RC

Winter 2006 Fully Connected Networks A crossbar switch is another example. Bandwidth is O(N) Cost is O(N 2 ), why? As more inputs/outputs are added, the total number of cross points grow by N 2. The scalability of fully connected networks is bad for large host sizes. Usually smaller components of the network (like a basic switching element) may be fully connected.

Winter 2006 Linear Arrays Linear Array Assume we have N (0..N-1) nodes assembled in a linear fashion. Assume each node is connected with a bi- directional link. What is the diameter? ANS: N – 1 Average routing distance ~ 2/3 N. The bisection is one link.

Winter 2006 Linear Arrays The route from node A to node B can be described by the operation B-A. This result is termed as the relative address. Provides a log N – bit number with positive numbering being away from node 0. This arrangement provides no fault tolerance.

Winter 2006 Ring Bi-directional Links Easily constructed by connecting the ends of a linear array together. Degree: 2 The diameter is N/2 The bisection of the network is 2 The average routing distance is N/3 Note there are two relative addresses because we can travel in either direction. Also provides better fault tolerance.

Winter 2006 Ring Unidirectional Links If we have a ring the can only transmit in one direction we have the following properties: Diameter N – 1 Average Distance is N/2 Relative Address ( B – A ) mod N Bisection width: 1

Winter 2006

Higher Dimension Meshes and Tori A d-dimensional array consists of the following elements { k d-1 x k d-2 … x k 0 } Where k is a vector of elements. If 0

Winter 2006 Higher Dimension Meshes and Tori Assuming the length along each dimension is equal. N = k d The degree of each node varies between 2d and d. Nodes on the inside have the maximum degree and nodes on the corners have the smallest. For example for d = 3, nodes on the corners have 3 links or channels, and nodes on the inside have 6 channels. What about tori?

Winter 2006 Higher Dimension Meshes and Tori These arrays are called d-dimensional k-ary arrays. To extend to a torus, the edges are simply connected to the opposite side. Usually these types of structures are direct networks, meaning that every node contains a processing element. The network will scale by increasing k.

Winter 2006 Higher Dimension Arrays and Tori We can form a relative address by simply performing vector subtraction (unidirectional case): R = (b d-1 - a d-1, b d-2 - a d-2, … b 0 - a 0 ) Actual routing can be performed in any order. The diameter is simply d*(k-1) If k is even, the bisection of a d-dimensional k-ary structure will be k d-1. If k is odd it maybe a little larger.

Winter 2006 Higher Dimension Arrays and Tori The average distance is the average distance in each dimension. Therefore average h = d * 2/3 * k roughly. Spatially these networks scale in size to whatever dimension we have. Volume in d=3 and planar space in d=2. Assuming shortest possible wiring.

Winter 2006 Higher Dimension Arrays and Tori

Winter 2006 Trees With meshes, the average routing distance grows with log d N. A binary tree has a degree of 3 (three connections per node) Usually trees are used in indirect networks. Indirect Case: Addressing to the leaves can be taken as a log 2 N bit vector. This gives the path from the root to the host. 0 = left, 1 = right. The diameter is 2 * log 2 N Average routing distance is almost as large as the diameter of the network.

Winter 2006 Trees Relative addressing can be accomplished by doing the bit-wise XOR operation. For example to get the relative address from A to B. R = A XOR B. The position of the most significant 1 is how many levels we go up. Then we use the lower bits of B to get to B. We may not have to go all the way to the root!

Winter 2006 Trees A = 0001 B = 0101 A XOR B = 0100

Winter 2006 Trees We can have trees of higher order called k-ary trees. We can also have fat trees. More bandwidth is assigned to more important links as we go towards the root. A big problem with trees is that the root is composed of one link, therefore the bisection is one link.

Winter 2006 Butterflies The construction of a butterfly is similar to that of a tree. We have many roots in a butterfly. In addition, many parallel algorithms communicate in a butterfly structure, ex: Fast Fourier Transform and Batcher odd-even merge sort.

Winter 2006 Butterflies As a building block we start with 2 x 2 switch elements. The basic building block is setup so that addressing can occur. A bit of 0 causes a straight edge to be followed. While a 1 will cause a crossover to occur. In the case of a unidirectional indirect butterfly with N hosts, the bisection is N/2 links.

Winter 2006 Butterflies

Winter 2006 Butterflies When considering scalability, butterflies can be better than meshes and trees in the there are a total of N log 2 N (in the case of the previous figure) links with packets crossing log 2 N links on average Therefore on average there shouldn’t be any collisions. How many links are in the bisection?

Winter 2006 Butterflies

Winter 2006 Hypercubes If we take the original butterfly and collapse each straight column into a single log 2 N switch. This is close to a hypercube arrangement. We can cross dimensions of the hypercube to get from source to destination. Each node of a hypercube can embed a lower dimension mesh. Text P778.

Winter 2006 Hypercubes From: http://linux.cs.sonoma.edu/~ravi/ces516sp04/Lectures/feb18.ppt

Winter 2006 Some Example Architectures

Winter 2006

Routing Routing from source to destination is of primary importance to parallel computing. We have already seen some examples of how a relative address is formed. In the case of a d=3 cube the relative address will give the shortest path in all three dimensions.

Winter 2006 Routing The routing algorithm decides at each switch element, which output port to place the packet onto. 3 ways to determine output port based on packet header: Arithmetic Source based port select Table Lookup

Winter 2006 Routing Arithmetic 2D Mesh: Each relative address contains the length to be traveled in both the x and y directions [Δx, Δy] At switch i,j we perform the following routing:

Winter 2006 Routing The switch will look at the routing info in the packet and modify the distance in the appropriate direction. Dimension order routing takes each dimension in turn. Source based routing can also be used in which the source node assigns switch port numbers to the header. Simple from the switch side. May have variable header size and maybe large.

Winter 2006 Routing Table Driven Routing: Similar the the internet and WANs. Switches will have a table of information use for routing. The header contains an index that is used in the table to select the proper output port. Tables must be updated. Switch specific messages. The table must be established in the first place.

Winter 2006 Routing Deterministic Routing Route is determined solely on the source and destination. The status of the network is not considered. Dimension ordered routing is such an example. In the case of a 2D mesh how else can we route a packet? ANS: If one dimension was blocked, we could switch to the other dimension etc…

Winter 2006 Routing Adaptive: The route of the packet is determined by source and destination, but may be influenced by network conditions. In the previous case we could zig-zag across the mesh if both dimensions on the edges were congested.

Winter 2006 Deadlock Deadlock: occurs when a packet waits for an event that cannot occur. Live lock: Occurs when the routing of a packet never arrives at its destination. Indefinite Postponement: Occurs when the packet waits for an event that never happens.

Winter 2006 Deadlock Example

Winter 2006 Virtual Channels One way of avoiding such deadlock is to implement virtual channels. Virtual channels are used in wormhole routing and involve each physical channel to have multiple buffers. Assume we have 2 virtual channels in the prevoius example. Say packets at a node higher than their destination are placed in the high channel and the opposite for lower destinations.

Winter 2006 Virtual Channels

Winter 2006 Adaptive Routing

Winter 2006 Other Topics In Interconnection Networks Turn-Model Routing Switch Design Channel Buffers Flow Control There are differences between LANs and interconnection networks for parallel processing. Global Communications.

Winter 2006 SGI Origin Network Named SPIDER (we’ll see why when we look at it’s stricture), supports 1.56 GB/s total bandwidth in both directions. Each switch contains 6 pairs of unidirectional links. Two nodes are connected to each switch leaving 4 links to connect to other switches. Routing is table based. So that Message priority is supported.

Winter 2006 SGI Origin Network

Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks.

Similar presentations

Presentation on theme: "Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks.

Similar presentations

Presentation on theme: "Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks."— Presentation transcript:

Similar presentations

About project

Feedback