Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE.

Similar presentations


Presentation on theme: "Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE."— Presentation transcript:

1 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Part 8 Networks - 3

2 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.2 Hypercube Networks  H n - An n-dimensional hypercube network - 2 nodes  A 0-dimensional hypercube H 0 - a single node  H n constructed by connecting the corresponding nodes of two H n-1 networks  The edges added to connect corresponding nodes are called dimension-(n-1) edges n Dimension-0 edge H 1

3 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.3 Hypercube - Examples Dimension-0 edges Dimension-1 edges Dimension-2 edges Dimension-3 edges H 4 H 1 H 2 H 3 H 3

4 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.4 Routing in Hypercubes  Specific numbering to simplify routing  Number expressed in binary - if nodes i and j are connected by a dimension-k edge, the names of i and j differ in only the k-th bit position  Example - nodes 0000 and 0010 differ in only the 2 bit position - connected by a dimension-1 edge  Example - a packet needs to travel from node 14=1110 to node 2=0010 in an H network  Possible routings -  1110  0110 (dimension 3)  0010 (dimension 2)  1110  1010 (dimension 2)  0010 (dimension 3) 2 4 2 Dimension-0 edges Dimension-1 edges Dimension-2 edges Dimension-3 edges 1

5 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.5 Routing - General  In general - the distance between source and destination is the number of different bits in addresses  Going from X to Y can be accomplished by traveling once along each dimension in which they differ  X = x... x ; Y=y... y  Define z = x  y -  is the exclusive-or operator  Packet must traverse an edge in every dimension i for which z = 1 n- 1 0 0 i i i i

6 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.6 Fault Tolerance in Hypercubes  Hn (for n  2) can tolerate link failures - multiple paths from any source to any destination  Node failures can disrupt the operation  One way is to increase the number of communication ports of each node from n to n+1 and connecting these extra ports through additional links to one or more spare nodes  Example - two spare nodes - each a spare for 2 nodes of an Hn- 1 sub-cube  Spare nodes may require 2 ports - can be reduced by using several crossbar switches whose outputs is connected to the corresponding spare node  Number of ports of the spare node is reduced to n+ 1 - same as for all other nodes n- 1

7 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.7 An H 4 Hypercube with Two Spare Nodes S S

8 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.8 Different Method of Fault-Tolerance  Duplicating the processor in a few selected nodes  Each additional processor - spare also for any of the processors in the neighboring nodes  Example - nodes 0, 7, 8, 15 in H 4 - modified to duplex nodes  Every node now has a spare at a distance no larger than 1  Replacing a faulty processor by a spare results in an additional communication delay

9 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.9 Routing in Injured Hypercubes  Routing algorithm must be modified to route around the faulty nodes or links  Basic idea - list the dimensions along which the packet must travel, and traverse them one by one  As edges are traversed and are crossed off the list  If, due to a link or a node failure, the desired link is not available - another edge in the list, if any, is chosen for traversal  If packet arrives at some node to find all dimensions on its list down - it backtracks to the previous node and tries again

10 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.10 Formal Routing Algorithm - Notations  TD - list of dimensions that the message has traveled on - in order of traversal; TD - in reversed order   - exclusive-or operation carried out k times, sequentially  Example -  a means (a  a )  a  D - destination, S - source, d=D  S (  - bitwise exclusive-or operation on corresponding bits of D and S)  SC(A) - set of nodes visited if we travel on each of the dimensions listed in set A  Example - at node 0010 - SC(1,3)={0000,1000} R k i 3 1 2 3 i=1

11 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.11 Notations - Cont.  e - n-bit vector consisting of a 1 in the i-th bit position and 0 everywhere else  Example - e = 100  Packets are assumed to consist of  (I) d; d=D  S  (II) Message being transmitted (the ``payload'')  (III) List of dimensions taken so far - TD   - append operation  TD  x - append x to the list TD  transmit(j) - send packet (d  e, message, TD  j) along the j-th-dimensional link from the present node i 2 n j 3

12 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.12 Routing Algorithm for Injured Hypercubes

13 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.13 Example - H 3  H 3 with faulty node 011  node 000 wants to send a packet to 111  At 000, d=111 - sends the message out on dimension-0, to node 001  At 001, d = 110 and TD=(0) - attempts dimension-1 edge - impossible  Bit 2 of d is also 1 - checks and finds that the dimension-2 edge to 101 is available - message is sent to 101 and then to 111  Exercise - What if both 011 and 101 are down? Dimension-0 edges Dimension-1 edges Dimension-2 edges

14 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.14 Reliability of Point-to-Point Networks  Not necessarily a regular structure - often more than one path between any two nodes  Terminal Reliability - the probability that there exists an operational path between two specific nodes, given the probabilities of link failures  Example - calculating the terminal reliability for the source-sink pair N - N 4 1

15 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.15 Terminal Reliability - Example I  Three paths from N to N  P ={X,X }  P ={X,X,X }  p (q ) - probability that link X is good (faulty)  Nodes are assumed fault-free - if not, their failure probability is incorporated into outgoing links  Set of paths must be modified to an equivalent set of mutually exclusive events - otherwise some events will be counted more than once  Mutually exclusive events - (I) P up ; (II) P up and P down ; (III) P up and both P and P down  4 1 1 1,2 2,4 2 3 3,4 1,3 1,2 2,3 3,4 i,j 3 2 1 1 1 2

16 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.16 Calculating Terminal Reliability  Terminal reliability of a network with m paths P,..., P from source to sink  E (E ) - event in which path P is operational (faulty)  R = P(Operational Path Exists) = P(  E )  Set of events can be decomposed into mutually exclusive events -  O. P. Exists = E  ( E  E )  ( E  E  E ) ...  ( E  E  …  E )  R = P ( E ) +P ( E  E ) +P ( E  E  E ) +... + P ( E  E  …  E ) 1 m m i=1 i i i 1 i 1 1 1 2 2 m-1 m 3 1 1 1 1 2 2 3 m _ _ _ _ _ _ _ _ _ _ _

17 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.17 Terminal Reliability - Cont.  The last expression can be rewritten using conditional probabilities  R = P ( E ) +P ( E ) P ( E /E ) +P ( E ) P ( E  E /E ) +... + P ( E )P(E  …  E /E )  The problem is calculating the probabilities P(E  …  E /E )  To identify the links which must must fail so that E occurs but not E,…, E, conditional sets are used  S = P - P = { x | x  P and x  P }  Identifying disjoint events in the general case is not always straightforward 1 1 1 2 2 2 m 3 3 1 m-1 m i-1 1 i _ _ _ _ __ _ i 1 j/i i i j j

18 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.18 Terminal Reliability - Example II  This six-node network has 9 links - 6 uni-directional and 3 bi-directional  All paths from N 1 to N 6 -  Paths are ordered from shortest to longest

19 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.19 Calculating Terminal Reliability for Example II  The first term for the reliability equation is P(E )=p p p  To calculate the second term in the reliability equation - the conditional set is used  S = P - P = {x, x }  At least one of the links in this set must fail so that P is faulty (while P is operational)  The second term in the probability equation - p p p (1-p p ) 1/2 2 1 1,3 3,5 1 2 1,2 3,5 1,3 5,6 2,5 1 5,6 1,3 3,5

20 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.20 Example II - Cont.  For calculating other terms in the sum - intersection of several conditional sets must be considered  Calculating the fourth term - expression for P - the conditional sets are: S ={x }; S ={x,x,x }; S ={x,x }  S is included in S - if S is faulty, S is faulty  S can be ignored  The fourth term in the reliability equation -  p p p p (1-p )(1-p p ) 3/4 1/4 5,6 4 2/4 2,5 1,3 1,2 5,6 2,4 3,5 4,5 4,6 5,6 1,2 2,4 1,2 1/4 2/4

21 Copyright 2004 Koren & Krishna ECE655/Koren Part.8.21 Example II - Cont.  Calculating the third term S = {x,x,x } ; S = {x,x }  The two conditional sets are not disjoint  The event both S and S are faulty needs to be divided into disjoint events:  (I) x is faulty  (II) x is operational and both x and x are faulty  (III) Both x and x are up, and both x and x are faulty  Resulting expression for third term  p p p (q + p q q + p p q q )  Remaining terms - calculated similarly  Terminal reliability is the sum of all thirteen terms 2/3 1/3 1,3 5,6 3,5 2,5 5,6 1/3 2/3 5,6 1,3 1,2 1,3 2,5 3,5 2,5 5,6 3,5 2,4 4,6 1,3 2,5


Download ppt "Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE."

Similar presentations


Ads by Google