Presentation is loading. Please wait.

Presentation is loading. Please wait.

D ISTRIBUTED S YSTEM UNIT-2 Theoretical Foundation for Distributed Systems Prepared By: G.S.Mishra.

Similar presentations


Presentation on theme: "D ISTRIBUTED S YSTEM UNIT-2 Theoretical Foundation for Distributed Systems Prepared By: G.S.Mishra."— Presentation transcript:

1 D ISTRIBUTED S YSTEM UNIT-2 Theoretical Foundation for Distributed Systems Prepared By: G.S.Mishra

2 L IMITATION OF D ISTRIBUTED SYSTEM Absence of Shared Memory Absence of Global clocks Solution: Ordering of events & logical clocks

3 O RDERING OF E VENTS Lamport’s Happened Before relationship: For two events a and b, a → b if  a and b are events in the same process and a occurred before b  a is an event of sending a message ‘m’ and b is the corresponding receive event at the destination process  If a → b and b → c for some event c, then a → c (transitive relation )

4 O RDERING OF EVENTS C ONTD.. a → b implies a is a potential cause of b Causal ordering : potential dependencies “Happened Before” relationship causally orders events If a → b, then a causally affects b If a → b and b → a, then a and b are concurrent i.e. ( a || b)

5 L OGICAL C LOCK Each process i keeps a clock C i. IR1: Clock C i is incremented between any two successive events in process P i : C i := C i + d (d > 0) If a and b are two successive events in P i and a → b in P i, then C i (b) = C i (a) +d IR2: If event a is the sending of message ‘m’ by process P i, then message ‘m’ is assigned a time stamp t m = Ci(a). On receiving the same message ‘m’ by process Pj, Cj is set to a value greater than or equal to its present value and greater than t m. C j = max(C j, t m + d) ( d > 0 )

6 E XAMPLE -1

7 E XAMPLE -2

8 F IND OUT THE CLOCK VALUE OF EACH PROCESS AT ITS LAST EVENT

9 L IMITATION OF L AMPORT ’ S C LOCK a → b implies C(a) < C(b) BUT C(a) < C(b) doesn’t imply a → b !! So not a true clock !!

10 V ECTOR C LOCKS For P i process C i is a vector of size n (no. of processes) Update rules: [IR1] Clock C i is incremented between any two successive events in process P i : C i [i] := C i [i] + d (d > 0) [IR2] if a is sending of message ‘m’ from P i to P j with vector timestamp t m = C i (a) on receive of ‘m’ : C j [k] = max(C j [k], t m [k]) for all k

11 E XAMPLE -3

12 E XAMPLE -4

13 C OMPARISION OF TIME STAMPS For events a and b with vector timestamps t a and t b, t a = t b iff for all i, t a [i] = t b [i] t a ≠ t b iff for some i, t a [i] ≠ t b [i] t a ≤ t b iff for all i, t a [i] ≤ t b [i] t a ≤ t b iff for some i, t a [i] > t b [i] t a < t b iff (t a ≤ t b and t a ≠ t b ) t a || t b iff (t a < t b and t b < t a )

14 C ONCLUSION In the system of vector clocks a → b iff t a < t b Events a and b are causally related iff t a < t b or t b < t a, else they are concurrent So the vector clocks casually order events Now the vector clocks can be used to order the messages

15 C AUSAL ORDERING OF MESSAGES : APPLICATION OF VECTOR CLOCKS If send(m 1 )→ send(m 2 ), then every recipient of both message m 1 and m 2 must “deliver” m 1 before m 2. “deliver” – when the message is actually given to the application for processing Send( M 1 ) → Send( M 2 ) requires Receive( M 1 ) → Receive( M 2 )

16 V IOLATION OF CAUSAL ORDERING OF MESSAGES

17 S OLUTION upon arrival of a message at a process, buffer (delay delivery) the message until the message immediately preceding it is delivered. Birman-Schiper-Stephenson Protocol : Enforcing Causal Ordering of Messages Assumes broadcast communication channels that do not lose or corrupt messages. Use vector clocks to "count" number of messages set d = 1. n processes.

18 V ECTOR T IME When P i begins to execute, C i is initialized to zero. For each event send( m ) at P i, C i [i] is incremented by 1. Time stamp t m = C i is sent along with m. When process P j delivers a message m from P i, P j updates its vector clock: all k ∈ {1, 2,..n} : C j [k] = max ( C j [k], t m [k] ) ( Note: Recv ( m ) -> Deliver ( m ) )

19 T HE P ROTOCOL : Process P i updates vector time C i and broadcasts message m with timestamp t m = C i. So C i [i] - 1 is the number of messages sent before m.

20 Process P j ( j ≠ i ) upon receiving message m with timestamp t m, P j buffers the message until all messages sent by P i preceding m have arrived i.e. C j [i] = t m [i] - 1

21 P j has received all messages that P i had received before sending m. i.e. C j [k] ≥ t m [k] k = 1, 2,.. n, k ≠ i

22 E XAMPLE When the message is finally delivered at P j, vector time C j is adjusted according to vector clock rule 2

23 SES A LGORITHM SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes in the system. V_P is a vector of tuple (P’,t): P’ the destination process id and t, a vector timestamp. Tm: logical time of sending message m Tpi: present logical time at pi

24 SES A LGORITHM C ONTD …. Sending a Message: Send message M, time stamped tm, along with V_P1 to P2. Insert (P2, tm) into V_P1. Overwrite the previous value of (P2,t), if any. Any future message carrying (P2,tm) in V_P1 cannot be delivered to P2 until tm < tP2. Delivering a message If V_M (in the message) does not contain any pair (P2, t), it can be delivered. /* (P2, t) exists */ If t > Tp2, buffer the message. (don’t deliver) else deliver it

25 SES A LGORITHM C ONTD …. On delivering the message: Merge V_M (in message) with V_P2 as follows. If (P,t) is not there in V_P2, merge. If (P,t) is present in V_P2, t is updated with max(t in Vm, t in V_P2). Update site P2’s local, logical clock. Check buffered messages after local, logical clock update.

26 SES A LGORITHM C ONTD …. What does the condition t > Tp2 imply? t is message vector time stamp. t > Tp2 -> For all j, t[j] > Tp2[j] This implies some events occurred without P2’s knowledge in other processes. So P2 decides to buffer the message. When t < Tp2, message is delivered & Tp2 is updated with the help of V_P2 (after the merge operation).

27 EXAMPLE

28 e31: P3 sends message m3,1 to P2. C3 = (0, 0, 1); t3,1 ← (0, 0, 1), V3,1 ← (?, ?, ?); V3 ← [ ?, (0, 0, 1), ? ] e21: P2 receives message m3,1 from P3. As V3,1[2] = (?, ?, ?)[2] is uninitialized, the message is accepted. V2 ← [ ?, ?, ? ] and C2 ← max[(0, 0, 0), (0, 0, 1)] = (0, 0, 1) e22: P2 sends message m2,1 to P1. C2 ← (0, 1, 1); t2,1 ← (0, 1, 1), V2,1 ← [ ?, ?, ? ]; V2 ← [ (0, 1, 1), ?, ? ] e11: P1 sends message m1,1 to P3. C1 ← (1, 0, 0); t1,1 ← (1, 0, 0), V1,1 ← ( ?, ?, ? ); V1 ← [ ?, ?, (1, 0, 0) ] e32: P3 receives message m1,1 from P1. As V1,1[3] = ( ?, ?, ? )[3] is uninitialized, the message is accepted. V3 ← [ ?, (0, 0, 1), ? ] and C3 ← max[(0, 0, 1), (1, 0, 0)] = (1, 0, 1).

29 e12: P1 receives message m2,1 from P2. As V2,1[1] = [ ?, ?, ? ][1] is uninitialized, the message is accepted. V1 ← [ ?, ?, (1, 0, 0) ] and C1 ← max[(1, 0, 0), (0, 1, 1)] = (1, 1, 1) e23: P2 sends message m2,2 to P1. C2 ← (0, 2, 1); t2,2 ← (0, 2, 1), V2,2 ← [ (0, 1, 1), ?, ? ]; V2 ← [ (0, 2, 1), ?, ? ] e13: P1 receives message m2,2 from P2. As V2,2[1] = (0, 1, 1) < (1, 1, 1) = C1, the message is accepted. V1 ← [ ?, ?, (1, 0, 0) ] and C1 ← max[(0, 2, 1), (1, 1, 1)] = (1, 2, 1)

30 e31: P3 sends message m3,1 to P2. C3 = (0, 0, 1); t3,1 ← (0, 0, 1), V3,1 ← (?, ?, ?); V3 ← [ ?, (0, 0, 1), ? ] e21: P2 receives message m3,1 from P3. As V3,1[2] = (?, ?, ?)[2] is uninitialized, the message is accepted.

31 V2 ← [ ?, ?, ? ] and C2 ← max[(0, 0, 0), (0, 0, 1)] = (0, 0, 1) e22: P2 sends message m2,1 to P1. C2 ← (0, 1, 1); t2,1 ← (0, 1, 1), V2,1 ← [ ?, ?, ? ]; V2 ← [ (0, 1, 1), ?, ? ] e11: P1 sends message m1,1 to P3. C1 ← (1, 0, 0); t1,1 ← (1, 0, 0), V1,1 ← ( ?, ?, ? ); V1 ← [ ?, ?, (1, 0, 0) ] e32: P3 receives message m1,1 from P1. As V1,1[3] = ( ?, ?, ? )[3] is uninitialized, the message is accepted. V3 ← [ ?, (0, 0, 1), ? ] and C3 ← max[(0, 0, 1), (1, 0, 0)] = (1, 0, 1).

32 e23: P2 sends message m2,2 to P1. C2 ← (0, 2, 1); t2,2 ← (0, 2, 1), V2,2 ← [ (0, 1, 1), ?, ? ]; V2 ← [ (0, 2, 1), ?, ? ] e12: P1 receives message m2,2 from P2. But V2,2[1] = (0, 1, 1) </ (1, 0, 0) = C1, so the message is queued. e13: P1 receives message m2,1 from P2. As V2,1[1] = [ ?, ?, ? ][1] is uninitialized, the message is accepted. V1 ← [ ?, ?, (1, 0, 0) ] and C1 ← max[(1, 0, 0), (0, 1, 1)] = (1, 1, 1).

33 The message on the queue is now checked. As V2,2[1] = (0, 1, 1) < (1, 1, 1) = C1, the message is now accepted. V1 ← [ ?, ?, (1, 0, 0) ] and C1 is set to (1, 2, 1).

34 P ROBLEM OF V ECTOR C LOCK message size increases since each message needs to be tagged with the vector size can be reduced in some cases by only sending values that have changed

35 Capturing Global State

36 G LOBAL S TATE C OLLECTION Applications: Checking “stable” properties, checkpoint & recovery Issues: Need to capture both node and channel states system cannot be stopped no global clock

37 Some notations: LS i : local state of process i send(m ij ) : send event of message m ij from process i to process j rec(m ij ) : similar, receive instead of send time(x) : time at which state x was recorded time (send(m)) : time at which send(m) occured

38 send(m ij ) є LS i iff time(send(m ij )) < time(LS i ) rec(m ij ) є LS j iff time(rec(m ij )) < time(LS j ) transit(LS i,LS j ) = { m ij | send(m ij ) є LS i and rec(m ij ) є LS j } inconsistent(LS i, LS j ) = {m ij | send(m ij ) є LS i and rec(m ij ) є LS j }

39 Global state: collection of local states GS = {LS 1, LS 2,…, LS n } GS is consistent iff for all i, j, 1 ≤ i, j ≤ n, inconsistent(LS i, LS j ) = Ф GS is transitless iff for all i, j, 1 ≤ i, j ≤ n, transit(LS i, LS j ) = Ф GS is strongly consistent if it is consistent and transitless.

40 C HANDY -L AMPORT ’ S A LGORITHM Global-State-Detection Algorithm Send a special message called marker Chandy-Lamport Global State Recording Protocol ( Snapshot Algorithm ) The goal of this distributed algorithm is to capture a consistent global state. It assumes all communication channels are FIFO. It uses a distinguished message called a marker to start the algorithm. P i sends marker P i records its local state For each channel C ij on which P i has not already sent a marker, P i sends a marker before sending other messages.

41 C HANDY -L AMPORT A LGO.. C ONTD …. P j receives marker from P i If P j has not recorded its state: a) Records the state of C ij as empty b) Sends the marker as described above ( Note: it records local state before sending out marker ) If P j has recorded its state local state LS j a) Record the state of C ij to be the sequence of messages received between the computation of LS j and the marker from C ij.

42 Points to Note: Markers sent on a channel distinguish messages sent on the channel before the sender recorded its states and the messages sent after the sender recorded its state The state collected may not be any state that actually happened in reality, rather a state that “could have” happened Requires FIFO channels Network should be strongly connected (works obviously for connected, undirected also) Message complexity O(|E|), where E = no. of links

43 C HANDY -L AMPORT A LGO.. C ONTD …. P j receives marker from P i If P j has not recorded its state: a) Records the state of C ij as empty b) Sends the marker as described above ( Note: it records local state before sending out marker ) If P j has recorded its state local state LS j a) Record the state of C ij to be the sequence of messages received between the computation of LS j and the marker from C ij.

44 E XAMPLE Here, all processes are connected by communications channels Cij. Messages being sent over the channels are represented by arrows between the processes.

45 Snapshot s1: P1 records LS1, sends markers on C12 and C13 P2 receives marker from P1 on C12; it records its state LS2, records state of C12 as empty, and sends marker on C21 and C23 P3 receives marker from P1 on C13; it records its state LS3, records state of C13 as empty, and sends markers on C31 and C32.

46 P1 receives marker from P2 on C21; as LS1 is recorded, it records the state of C21 as empty. P1 receives marker from P3 on C31; as LS1 is recorded, it records the state of C31 as empty. P2 receives marker from P3 on C32; as LS2 is recorded, it records the state of C32 as empty. P3 receives marker from P2 on C23; as LS3 is recorded, it records the state of C23 as empty.

47 Snapshot s2: now messages are in transit on C12 and C21. P1 records LS1, sends markers on C12 and C13 P2 receives marker from P1 on C12 after the message from P1 arrives; it records its state LS2, records state of C12 as empty, and sends marker on C21 and C23 P3 receives marker from P1 on C13; it records its state LS3, records state of C13 as empty, and sends markers on C31 and C32.

48 P1 receives marker from P2 on C21; as LS1 is recorded, and a message has arrived since LS1 was recorded, it records the state of C21 as containing that message. P1 receives marker from P3 on C31; as LS1 is recorded, it records the state of C31 as empty. P2 receives marker from P3 on C32; as LS2 is recorded, it records the state of C32 as empty. P3 receives marker from P2 on C23; as LS3 is recorded, it records the state of C23 as empty.

49 T ERMINATION D ETECTION Protocol Pi sends a computation message to Pj 1. Set Wi’ and Wj to values such that Wi’ + Wj = Wi, Wi > 0, Wj > 0. (Wi’ is the new weight of Pi.) 2. Send B(Wj) to Pj Pj receives a computation message B(W) from Pi 1. Wj = Wj + W 2. If Pj is idle, Pj becomes active

50 T ERMINATION DETECTION C ONTD.. Pi becomes idle: 1. Send C(Wi) to P0 2. Wi = 0 3. Pi becomes idle Pi receives a control message C(W): 1. Wi = Wi + W 2. If Wi = 1, the computation has completed.

51 EXAMPLE The picture shows a process P0, designated the controlling agent, with W0 = 1. It asks P1 and P2 to do some computation. It sets W1 to 0.2, W2 to 0.3, and W3 to 0.5. P2 in turn asks P3 and P4 to do some computations.

52 It sets W3 to 0.1 and W4 to 0.1. When P3 terminates, it sends C(W3) = C(0.1) to P2, which changes W2 to 0.1 + 0.1 = 0.2. When P2 terminates, it sends C(W2) = C(0.2) to P0, which changes W0 to 0.5 + 0.2 = 0.7. When P4 terminates, it sends C(W4) = C(0.1) to P0, which changes W0 to 0.7 + 0.1 = 0.8. When P1 terminates, it sends C(W1) = C(0.2) to P0, which changes W0 to 0.8 + 0.2 = 1. P0 thereupon concludes that the computation is finished. Total number of messages passed: 8 (one to start each computation, one to return the weight).

53

54

55

56

57

58

59

60

61 When Q receives a marker along a channel C: If Q has not recorded its state then Q records the state of C as empty; Q then follows the marker sending rule If Q has already recorded its state, it records the state of C as the sequence of messages received along C after Q’s state was recorded and before Q received the marker along C

62 Points to Note: Markers sent on a channel distinguish messages sent on the channel before the sender recorded its states and the messages sent after the sender recorded its state The state collected may not be any state that actually happened in reality, rather a state that “could have” happened Requires FIFO channels Network should be strongly connected (works obviously for connected, undirected also) Message complexity O(|E|), where E = no. of links

63 C HANDY -L AMPORT A LGO.. C ONTD …. P j receives marker from P i If P j has not recorded its state: a) Records the state of C ij as empty b) Sends the marker as described above ( Note: it records local state before sending out marker ) If P j has recorded its state local state LS j a) Record the state of C ij to be the sequence of messages received between the computation of LS j and the marker from C ij.

64

65

66 B IRMAN -S CHIPER -S TEPHENSON P ROTOCOL To broadcast m from process i, increment C i (i), and timestamp m with VT m = C i [i] When j ≠ i receives m, j delays delivery of m until C j [i] = VT m [i] –1 and C j [k] ≥ VT m [k] for all k ≠ i Delayed messaged are queued in j sorted by vector time. Concurrent messages are sorted by receive time. When m is delivered at j, C j is updated according to vector clock rule.

67

68

69

70

71

72

73

74

75


Download ppt "D ISTRIBUTED S YSTEM UNIT-2 Theoretical Foundation for Distributed Systems Prepared By: G.S.Mishra."

Similar presentations


Ads by Google