Distributed Computing 5. Snapshot Shmuel Zaks ©

Presentation on theme: "Distributed Computing 5. Snapshot Shmuel Zaks ©"— Presentation transcript:

Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

2 The snapshot algorithm (Candy and Lamport)

3

4

5 Goal: design a snapshot (=global-state- detection) algorithm that:  will record a collection of states of all system components (which forms a global system state),  will not change the underlying computation,  will not freeze the underlying computation

6 A Process Can…  record its own state,  send and receive messages,  record messages it sends and receives,  cooperate with other processes  Processes do not share clocks or memory  Processes cannot record their state precisely at the same instant

7 Motivation  Many problems in distributed systems can be stated in terms of the problem of detecting global states: Stable property detection problems : termination detection, deadlock detection etc.  Checkpointing

8 Stable Property Detection Problem D - distributed system y - a predicate function defined on the set of global states of D S, S’ – global states of D y is stable if y(S) implies y(S’) for all S’ reachable from S

 many distributed algorithms are structured as a sequence of phases  A phase: transient part, then a stable part phase termination vs. computation termination  our view on the problem: i.detect the termination of a phase ii.initiate a new phase Notice that “the kth phase has terminated” is a stable property 9

10 Model  Distributed system D is a finite, labeled, directed graph. p q C2 C1  Channels have infinite buffers, are error- free and preserve FIFO  Message delay is bounded, but unknown

11 State of a Channel 1 p q C1 23 1  [1, 2, 3] – sequence X of messages that were sent  [1] – sequence Y of received messages ( prefix of X )  [2, 3] – state of C1: X \ Y pq C2 C1

12 Example: System Distributed system: p C2C2 C1C1 Initial global state: B A Ø Ø State transitions (same for p and q): A B send receive q

13 A A Ø A A Ø A B Ø Ø B A Ø Ø A computation corresponds to a path in the diagram p qq p p sends q receives q sends p receives q sends C1C1 p C2C2 q deterministic A B send receive Global state transition diagram

14 Distributed system: State transition: p : q : CD send receive A B send receive p C2C2 C1C1 q Example: System

15 qp C2C2 C1C1 A D Ø B C Ø B D A C Ø Ø p qq p p sends q sends p receives Global state transition diagram q receives non-deterministic q sends A B send receive CD send receive q receives

16 qp C2C2 C1C1 A D Ø B C Ø B D A C Ø Ø p qq p p sends q sends p receives We look at the following sequence of events: A B send receive CD send receive

17 Each process records its own state p and q cooperate to record the state of C. p C q in the snapshot algorithm:

18 B A Ø p q Example: System A A A A Recorded state: p C q Ø No token C1C1 p C2C2 q A B send receive Record C Record q Record p

19 B A Ø Ø p q Example: System B A A A Ø Recorded state: p C1C1 q Two tokens Record p Record C Record q C1C1 p C2C2 q A B send receive

C’s state recorded time P sends a message on C P’s state recorded C’s state recorded P sends a message on C P’s state recorded 20 Record p Record C Record q Record C Record q Record p

21 q will record the state of C q starts recording C after it records its state p C q p and q have to coordinate ; using a special marker q stops when receiving from p But: how does q know when to record its state?

22 Who starts? We assume one process. The snapshot algorithm Hw: extend discussion + proof to any number of startes.

 Who will record the state of channel C? q  How q knows when to stop recording? p sends right after it records its state, and before sending any other message  q starts recording after it records its state (Intuition for the Algorithm) p C q 23

24 The snapshot algorithm Ends when q receives along C Starts when q records itself channel recording p C q Note : for any q  p 0, the channel along which arrived first is recorded as 

25 p 0 starts. The snapshot algorithm p 0 recoreds its state, and then broadcasts. Shout-algorithm = PI (Propogation-of-information)= hot potato = … When q receives for the first time, it records its own state State recording

26 1. record the state of p 2. send along c before sending any other message Marker-Receiving Rule for a process q if q’s state is not recorded: 1. record state; 2. record c’s state =  ; else: c’s state is the sequence of messages received since q recorded its state The snapshot algorithm on receiving along channel c: Marker-Sending Rule for a process q

Termination Assumption No marker remains forever in an input channel Claim: If the graph is strongly connected and at least one process records its state, then all processes will record their state in finite time Proof: by induction 27

28 The Recorded Global State State transition: p : q : C D send receive A B send receive p C2C2 C1C1 q Ex: System

29 A D  B C  B D A C   pqqp p sends q sends p receives A D  qp C2C2 C1C1 A B send receive CD send receive A

30 What did we get?

31  Event e in process p is an atomic action: can change the state of p, and a state of at most one channel c incident on p (by sending/receiving message M along c )  e is defined by  e = may occur in global state S if 1. the state of p in S is s. 2 a. if c is directed towards p: c ’s state has M in its head, and is deleted after applying e. b. if c is directed from p: c ’s state has M in its tail after applying e. 3. the state of p after applying e is s’.

32 Process State and Global State  A process: set of states, an initial state set of events  A global state S : collection of process states and channel states initially, each process is in its initial state and all channels are empty next(S, e) is the global state after event e in applied to global state S

33 Process State and Global State  seq = (e i : i = 0…n) is a computation of the system iff e i may occur in S i, S i+1 = next(S i, e i ) (S 0 is the initial global state)

34 seq = (e i : i ≥ 0) a distributed computation S i – the state of the system right before e i occurs S 0 – the initial state of the system S t – the state of the system at the termination of the algorithm S* - the recorded global state The Recorded Global State

35 Definition Event e j is called pre-recording if e j is in a process p and p records its state after e j in seq. Event e j is called post-recording if e j is in a process p and p records its state before e j in seq. Assume that e j-1 is a post-recording event before Pre-recording event e j in seq. pre-recording post-recording

36 Lemma: Proof: e j-1 occurs in p and e j in q, and q ≠p (since e j-1 is and e j is.)

37 The only scenario that might prevent interchanging the two events is that a message M is sent at e j-1 and received at e j. but this cannot be possible: if M is sent at e j-1, then M is, so a marker was sent to q before M, so when it is received in e j q already recorded its state, so e j Is,a, a contradiction.

38 Hence, event e j can occur in global state S j-1. The state of process p is not altered by e j, hence e j-1 can occur after e j.

39 We have to show that the states of all Processes and channels are the same in S 2 and S 4. This clearly holds for proceses and channels That do not take part in ej-1 and ej.

40 states: the states of p and q in S2 and in S4 are the same. channels: whether ej-1/ej send/receive(/neither) a message along a channel, the same is done in both scenarios, So the states of the channels in S 2 and S 4 are the same. (End of proof. )

(The Recorded Global State)

42 Proof Using the lemma, swap the events till all events appear after all events. The acquired computation is seq’. All that is left to show: S* is a global state after all events and before all events. 1.Process states 2.Channel states

43 Claim: The state of a channel in S* is (sequence of messages corresp. to pre-recorded receives)-(sequence of messages corresp. to prerecorded sends) Proof: The state of channel c from process p to process q recorded in S* is the sequence of messages received on c by q after q records its state and before q receives a marker on c. The sequence of messages sent by p is the sequence corres. to prerecording sends on c.

44 A D B C D A C   pq q p p sends q sends p receives A D  B post pre post qp C2C2 C1C1 A B send receive CD send receive 

45 A D  A D D A C   p q q p q sends p sends p receives A D  A (Another execution) pre post B  qp C2C2 C1C1 A B send receive CD send receive

What did we get? A configuration that could have happened 46

seq = (e i : i ≥ 0) a distributed computation S i – the state of the system right before e i occurs S 0 – the initial state of the system S t – the state of the system at the termination of the algorithm S* - the recorded global state 47

Stable Detection D - distributed system y - a predicate function defined on the set of global states of D S, S’ – global states of D y is a stable property of D if y(S) implies y(S’) for all S’ reachable from S 48

49 Input: A stable property y Output: a boolean value b with the property: y(S 0 ) b and b y(S t ) Algorithm Algorithm: begin record a global state S* b := y(S*) end

50 Correctness 1. S* is reachable from S 0 2. S t is reachable from S* 3. y(S) y(S’) for all S’ reachable from S S 0 S* S t y(S*)=true y(S t )=true  y(S*)=false  y(S 0 )=false

References K. M. Chandy and L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems 51