Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)

Similar presentations


Presentation on theme: "1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)"— Presentation transcript:

1 1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)

2 2 RAID Summary RAID 1-5 can tolerate a single fault – mirroring (RAID 1) has a 100% overhead, while parity (RAID 3, 4, 5) has modest overhead Can tolerate multiple faults by having multiple check functions – each additional check can cost an additional disk (RAID 6) RAID 6 and RAID 2 (memory-style ECC) are not commercially employed

3 3 I/O Performance Throughput (bandwidth) and response times (latency) are the key performance metrics for I/O The description of the hardware characterizes maximum throughput and average response time (usually with no queueing delays) The description of the workload characterizes the “real” throughput – corresponding to this throughput is an average response time

4 4 Throughput Vs. Response Time As load increases, throughput increases (as utilization is high) – simultaneously, response times also go up as the probability of having to wait for the service goes up: trade-off between throughput and response time In systems involving human interaction, there are three relevant delays: data entry time, system response time, and think time – studies have shown that improvements in response time result in improvements in think time  better response time and much better throughput Most benchmark suites try to determine throughput while placing a restriction on response times

5 5 Estimating Response Time Queueing theory provides results that can characterize some random processes Little’s Law: Mean number of tasks in system = Arrival rate x mean response time The following two results are true for workloads with interarrival times that follow a Poisson distribution P(k tasks arrive in time interval t) = e -a a k / k! Time queue = Time server x server utilization/(1-server utilization) Length queue = server utilization 2 / (1 – server utilization)

6 6 Interconnect Types Number of autonomous systems connected 110100100010000 Bus SAN LAN WAN

7 7 Communication Applications typically send messages through the OS Steps to send a message:  copy data into an OS buffer  OS attaches header and trailer  OS sends transfer command to network interface hw Steps to receive a message:  copy data from network interface hw into OS buffer  make sure data is correct, send acknowledgment  copy data into user space and signal the application If sender does not see an ack in time, it resends the data

8 8 Latency Sender overhead: the processing delays to send the application data to the network interface hardware Time of flight: latency for the first bit to travel from sender to receiver: function of distance and speed of light Transmission time: time between arrival of first and last bit: function of message size and network bandwidth Transport latency: sum of time of flight and transmission time Receiver overhead: processing delays to send data from hw to application: usually longer than sender overhead

9 9 Topologies Internet topologies are not very regular – they grew incrementally Supercomputers have regular interconnect topologies and trade off cost for high bandwidth Nodes can be connected with  centralized switch: all nodes have input and output wires going to a centralized chip that internally handles all routing  decentralized switch: each node is connected to a switch that routes data to one of a few neighbors

10 10 Centralized Crossbar Switch P1 P2 P3 P4 P5 P6 P7 P0 Crossbar switch

11 11 Centralized Crossbar Switch P1 P2 P3 P4 P5 P6 P7 P0

12 12 Crossbar Properties Assuming each node has one input and one output, a crossbar can provide maximum bandwidth: N messages can be sent as long as there are N unique sources and N unique destinations Maximum overhead: WN 2 internal switches, where W is data width and N is number of nodes To reduce overhead, use smaller switches as building blocks – trade off overhead for lower effective bandwidth

13 13 Switch with Omega Network P1 P2 P3 P4 P5 P6 P7 P0 000 001 010 011 100 101 110 111 110 101 100 011 010 001 000

14 14 Omega Network Properties The switch complexity is now O(N log N) Contention increases: P0  P5 and P1  P7 cannot happen concurrently (this was possible in a crossbar) To deal with contention, can increase the number of levels (redundant paths) – by mirroring the network, we can route from P0 to P5 via N intermediate nodes, while increasing complexity by a factor of 2

15 15 Tree Network Complexity is O(N) Can yield low latencies when communicating with neighbors Can build a fat tree by having multiple incoming and outgoing links P0P3P2P1P4P7P6P5

16 16 Bisection Bandwidth Split N nodes into two groups of N/2 nodes such that the bandwidth between these two groups is minimum: that is the bisection bandwidth Why is it relevant: if traffic is completely random, the probability of a message going across the two halves is ½ – if all nodes send a message, the bisection bandwidth will have to be N/2 The concept of bisection bandwidth confirms that the tree network is not suited for random traffic patterns, but for localized traffic patterns

17 17 Distributed Switches: Ring Each node is connected to a 3x3 switch that routes messages between the node and its two neighbors Effectively a repeated bus: multiple messages in transit Disadvantage: bisection bandwidth of 2 and N/2 hops on average

18 18 Distributed Switch Options Performance can be increased by throwing more hardware at the problem: fully-connected switches: every switch is connected to every other switch: N 2 wiring complexity, N 2 /4 bisection bandwidth Most commercial designs adopt a point between the two extremes (ring and fully-connected):  Grid: each node connects with its N, E, W, S neighbors  Torus: connections wrap around  Hypercube: links between nodes whose binary names differ in a single bit

19 19 Topology Examples Grid Hypercube Torus CriteriaBusRing2Dtorus6-cubeFully connected Performance Bisection bandwidth 1216321024 Cost Ports/switch Total links1 3 128 5 192 7 256 64 2080

20 20 Title Bullet


Download ppt "1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)"

Similar presentations


Ads by Google