Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ordering and Consistent Cuts Presented by Chi H. Ho.

Similar presentations


Presentation on theme: "Ordering and Consistent Cuts Presented by Chi H. Ho."— Presentation transcript:

1 Ordering and Consistent Cuts Presented by Chi H. Ho

2 Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport

3 Introduction 2000 PODC Influential Paper Award Outline of the paper: not in presented order –Partial and Total Orderings –Logical and Physical Clocks –Clock and Strong Clock Conditions –Synchronize Physical Clocks Beyond…

4 “Happened Before” a  b : if –a and b are events in the same process and a comes before b, or –a is the send event of some message, and b is the receive event of the same message. Transitive: (a  b) & (b  c)  (a  c) Concurrent: ( a  b) & (b  a). Partial Ordering

5 Examples q 5  p 4 q 2  q 3 p 1  r 3 q 2 // p 2 q 2 // p 3 Partial Ordering

6 Logical Clock Clock Condition:  a,b: a  b  C(a) < C(b) Partial Ordering Implementation

7 Logical Clock Implementation Rules: –IR1: Each process P i increments C i between any two successive events. –IR2: If event a is the sending of a message m by process P i, then the message contains a timestamp T m = C i (a). Upon receiving a message m, process P j sets C j greater than or equal to its present value and greater than T m. Partial Ordering Implementation

8 Examples Partial Ordering Implementation P0P0 P1P1 0

9 Examples Partial Ordering Implementation P0P0 P1P1 0 0

10 Examples Partial Ordering Implementation P0P0 P1P1 0 0 1

11 Examples Partial Ordering Implementation P0P0 P1P1 0 0 1 1

12 Examples Partial Ordering Implementation P0P0 P1P1 0 0 1 12

13 Examples Partial Ordering Implementation P0P0 P1P1 0 0 1 123 [3]

14 Examples Partial Ordering Implementation P0P0 P1P1 0 0 1 123 [3] 2 [2]

15 Examples Partial Ordering Implementation P0P0 P1P1 0 0 1 123 [3] 2 [2] 4

16 Examples Partial Ordering Implementation P0P0 P1P1 0 0 1 123 [3] 2 [2] 4 4

17 Extended “Happened Before” a => b: iff –C i (a) < C j (b), or –(C i (a) = C j (b)) & (P i ≺ P j ) Total Ordering

18 Example Application Shared resource granting –Fixed number of processes –Single shared resource –Requirements: I. Mutual Exclusive II. Fair III. Exhaustive Total Ordering

19 Example Application Solution: Distributed algorithm Model: –Channels are FIFO –Each process maintains a process queue Algorithm –Request: broadcast T m :P i request resource –Release: broadcast T m :P i release resource –Receive request: enqueue –Receive release: dequeue –Resource granted (local decision): P i T m :P i request resource w/ T m min P i has received from every process a msg timestamped later than T m Note: –Can be generalized to solve Replicated State Machine! Total Ordering

20 Anomaly Amazon.com [19]

21 Anomaly Amazon.com [19]

22 Anomaly Amazon.com [19] [7]

23 Anomaly Amazon.com [19] [7]

24 Anomaly Amazon.com [19] [7] External event

25 Strong Clock Condition S = {events in the system} S = S ⋃ {relevant external events}  is “happened before” for S ∀ a,b ∈ S : a  b  C(a) < C(b) Avoid Anomaly

26 Physical Clocks PC1: (drift rate bound) ∃  << 1 such that ∀ i: |dC i (t)/dt – 1| <  PC2: (drift bound)  i,j: |C i (t) – C j (t)| < 

27 Avoid Anomaly  < shortest msg transmission time ∀ i,j,t: C i (t+  ) – C j (t) > 0 Physical Clocks   /(1-  )   Amazon.com j i C j (t) > C i (t+  ) > 

28 Implementation Rules IR1’: –For each i, if P i does not receive a message at physical time t, then C i is differentiable at t and dC i (t)/dt > 0. IR2’: –(a) If P i sends a message m at physical time t, then m contains a timestamp T m = C i (t). –(b) Upon receiving a message m at time t’, process P j sets C j (t’) equal to maximum (C j (t’-0), T m +  m ) Physical Clocks

29 Synchronize Physical Clocks Physical Clocks Problem statement: –IR1’ and IR2’ are followed, –Message delay is bounded, –Clocks satisfied PC1, –Goal: PC2 Algorithm: –Every  seconds, a message is sent over every arc. Guarantees: –Clocks are synchronized after t 0 +  d –   d(2  +  )

30 Beyond… Shortcomings: –No gap-detection property – C(a) < C(b)  ??? –Bounds are not practical (So is PC!)

31 Gap Detection Property Problem statement: –Given: a, b, C(a), C(b), C(a) < C(b), –Determine if c exists, where C(a) < C(c) < C(b) ? Beyond…

32 Another Strong Clock Condition a  b  C(a) < C(b) Beyond…

33 What clock, then? Causal histories: Beyond… Vector Clocks:

34 More on Vector Clocks  Strong Clock Condition  Concurrent  Pair-wise Inconsistent  Consistent Cut  Counting  Gap Detection Beyond…

35 More on Vector Clocks  Strong Clock Condition  Concurrent  Pair-wise Inconsistent  Consistent Cut  Counting  Gap Detection, but… Beyond… X  Weak Gap-Detection Given a, b, can detect existence of c such that (c  a) & (c  b)

36 Reference O. Babaoglu and K. Marzullo. Consistent global states of distributed systems: Fundamental concepts and mechanisms. In Sape Mullender, editor, Distributed Systems, ch. 4, pages 55--96. Addison Wesley, 2nd ed., 1993. http://citeseer.ist.psu.edu/babaoglu93consis tent.html http://citeseer.ist.psu.edu/babaoglu93consis tent.html Note: some materials in this paper are used to clarify a few concepts in the next paper. Beyond…

37 Distributed Snapshots: Determining Global States of Distributed Systems K. Mani Chandy Leslie Lamport

38 Introduction Outline of the paper: –Motivation –Model –Algorithm –Correctness –Other issues Beyond…

39 Motivation Capture the global state of a system. Really? True global state: Impossible!!! p1p1 e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 p2p2 e21e21 e22e22 e23e23 e24e24 e25e25

40 p1p1 e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 p2p2 e21e21 e22e22 e23e23 e24e24 e25e25 Motivation Capture the global state of a system. Really? These are what can be done Are they useful?

41 p1p1 e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 p2p2 e21e21 e22e22 e23e23 e24e24 e25e25 Motivation Capture the global state of a system. Useful? Equivalent! p1p1 e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 p2p2 e21e21 e22e22 e23e23 e24e24 e25e25

42 p1p1 e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 p2p2 e21e21 e22e22 e23e23 e24e24 e25e25 Motivation Capture the global state of a system. Useful? Consistent, but not happens in reality. p1p1 e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 p2p2 e21e21 e22e22 e23e23 e24e24 e25e25

43 p1p1 e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 p2p2 e21e21 e22e22 e23e23 e24e24 e25e25 Motivation Capture the global state of a system. Useful? Not even consistent! p1p1 e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 p2p2 e21e21 e22e22 e23e23 e24e24 e25e25

44 Motivation Capture the global state of a system. Useful? Yes: –To detect stable properties of a system: y(S)  y(S’) for all S’ reachable from S. –E.g.: “computation has terminated,” “the system is deadlocked,” “all tokens in a token ring have disappeared.”

45 Model A distributed system A distributed system (on the right). A global state = set of processes’ and channels’ states. Event: –atomic –e = Computation: –seq =(e i : 0  i  n) –S i+1 = next(S i, e i ) Channels’ assumptions: –Singly directed –FIFO –Asynchronous –Error free –Infinite buffer

46 Algorithm Invoker: behave as if receiving a marker from a virtual node. Receiving rule for process q receiving a marker along channel c : if q has not recorded its state then begin q records its state; q records the state c as the empty sequence end else q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c. Sending rule for a process p : for each outgoing channel c : p sends one marker along c after p records its state and before p sends further messages along c.

47 Illustration Next 14 slides, courtesy of Professor Birman.

48 Chandy/Lamport p q r s t u v w x y z A network

49 Chandy/Lamport p q r s t u v w x y z A network I want to start a snapshot

50 Chandy/Lamport p q r s t u v w x y z A network p records local state

51 Chandy/Lamport p q r s t u v w x y z A network p starts monitoring incoming channels

52 Chandy/Lamport p q r s t u v w x y z A network “contents of channel p- y”

53 Chandy/Lamport p q r s t u v w x y z A network p floods message on outgoing channels…

54 Chandy/Lamport p q r s t u v w x y z A network

55 Chandy/Lamport p q r s t u v w x y z A network q is done

56 Chandy/Lamport p q r s t u v w x y z A network q

57 Chandy/Lamport p q r s t u v w x y z A network q

58 Chandy/Lamport p q r s t u v w x y z A network q z s

59 Chandy/Lamport p q r s t u v w x y z A network q v z x u s

60 Chandy/Lamport p q r s t u v w x y z A network q v w z x u s y r

61 Chandy/Lamport p q r s t u v w x y z A snapshot of a network q x u s v r t w p y z Done!

62 Correctness Consistency Termination

63 Consistency m is recorded iff so is send(m) : –sender’s state recording and marker sending are done atomically. m is not recorded more than once: –if channel is recorded before receiver, it will be empty. –if channel is recorded after receiver, none of the in-channel messages will be recorded as the receiver’s state. Correctness:

64 Termination Assumptions: –L1: no marker remains forever in a channel. –L2: processes’ states are recorded in finite time. Every process either spontaneously records its state, or there is a path from such a process. Every channel is flushed by a marker after the sender records its state. Correctness:

65 Remained Issues Property of recorded state: S i --> S * --> S f Stable detection: –Stable property: y(S i )  definite definite  y(S f ) –Algorithm: begin record a global state S * ; definite := y(S * ) end.

66 Beyond… Channels’ assumptions: –Singly directed –FIFO –Asynchronous –Error free –Infinite buffer

67 Non-FIFO What is FIFO for? –Separate messages between before-snapshot and after- snapshot. A snapshot counter piggybacked on messages would do just fine! Beyond:

68 Beyond… Channels’ assumptions: –Singly directed –FIFO –Asynchronous –Error free –Infinite buffer Messages can be corrupted/duplicated Messages can be dropped

69 Unreliable channels How to deal with corruption? –Checksum/ECC; reduced to drop. How to deal with duplication? –Message ID How to deal with dropping? –Channel states are not needed anymore. –Markers indicate completion. Beyond:

70 Even More Aggressive… Don’t want to piggyback! Step 1: no piggybacking: –Block all messages sent after recording local state and before receiving marker from all neighbors. Step 2: no blocking, min piggybacking –Blocked messages are sent with piggybacked snapshot info. Beyond:

71 Conclusion Two influential papers. Much work built upon these results. Can be improved significantly when being adopted to particular systems. Additional comments/suggestions?


Download ppt "Ordering and Consistent Cuts Presented by Chi H. Ho."

Similar presentations


Ads by Google