5 Goal: design a snapshot (=global-state- detection) algorithm that: will record a collection of states of all system components (which forms a global system state), will not change the underlying computation, will not freeze the underlying computation
6 A Process Can… record its own state, send and receive messages, record messages it sends and receives, cooperate with other processes Processes do not share clocks or memory Processes cannot record their state precisely at the same instant
7 Motivation Many problems in distributed systems can be stated in terms of the problem of detecting global states: Stable property detection problems : termination detection, deadlock detection etc. Checkpointing
8 Stable Property Detection Problem D - distributed system y - a predicate function defined on the set of global states of D S, S’ – global states of D y is stable if y(S) implies y(S’) for all S’ reachable from S
many distributed algorithms are structured as a sequence of phases A phase: transient part, then a stable part phase termination vs. computation termination our view on the problem: i.detect the termination of a phase ii.initiate a new phase Notice that “the kth phase has terminated” is a stable property 9
10 Model Distributed system D is a finite, labeled, directed graph. p q C2 C1 Channels have infinite buffers, are error- free and preserve FIFO Message delay is bounded, but unknown
11 State of a Channel 1 p q C1 23 1 [1, 2, 3] – sequence X of messages that were sent  – sequence Y of received messages ( prefix of X ) [2, 3] – state of C1: X \ Y pq C2 C1
12 Example: System Distributed system: p C2C2 C1C1 Initial global state: B A Ø Ø State transitions (same for p and q): A B send receive q
13 A A Ø A A Ø A B Ø Ø B A Ø Ø A computation corresponds to a path in the diagram p qq p p sends q receives q sends p receives q sends C1C1 p C2C2 q deterministic A B send receive Global state transition diagram
14 Distributed system: State transition: p : q : CD send receive A B send receive p C2C2 C1C1 q Example: System
15 qp C2C2 C1C1 A D Ø B C Ø B D A C Ø Ø p qq p p sends q sends p receives Global state transition diagram q receives non-deterministic q sends A B send receive CD send receive q receives
16 qp C2C2 C1C1 A D Ø B C Ø B D A C Ø Ø p qq p p sends q sends p receives We look at the following sequence of events: A B send receive CD send receive
17 Each process records its own state p and q cooperate to record the state of C. p C q in the snapshot algorithm:
18 B A Ø p q Example: System A A A A Recorded state: p C q Ø No token C1C1 p C2C2 q A B send receive Record C Record q Record p
19 B A Ø Ø p q Example: System B A A A Ø Recorded state: p C1C1 q Two tokens Record p Record C Record q C1C1 p C2C2 q A B send receive
C’s state recorded time P sends a message on C P’s state recorded C’s state recorded P sends a message on C P’s state recorded 20 Record p Record C Record q Record C Record q Record p
21 q will record the state of C q starts recording C after it records its state p C q p and q have to coordinate ; using a special marker q stops when receiving from p But: how does q know when to record its state?
22 Who starts? We assume one process. The snapshot algorithm Hw: extend discussion + proof to any number of startes.
Who will record the state of channel C? q How q knows when to stop recording? p sends right after it records its state, and before sending any other message q starts recording after it records its state (Intuition for the Algorithm) p C q 23
24 The snapshot algorithm Ends when q receives along C Starts when q records itself channel recording p C q Note : for any q p 0, the channel along which arrived first is recorded as
25 p 0 starts. The snapshot algorithm p 0 recoreds its state, and then broadcasts. Shout-algorithm = PI (Propogation-of-information)= hot potato = … When q receives for the first time, it records its own state State recording
26 1. record the state of p 2. send along c before sending any other message Marker-Receiving Rule for a process q if q’s state is not recorded: 1. record state; 2. record c’s state = ; else: c’s state is the sequence of messages received since q recorded its state The snapshot algorithm on receiving along channel c: Marker-Sending Rule for a process q
Termination Assumption No marker remains forever in an input channel Claim: If the graph is strongly connected and at least one process records its state, then all processes will record their state in finite time Proof: by induction 27
28 The Recorded Global State State transition: p : q : C D send receive A B send receive p C2C2 C1C1 q Ex: System
29 A D B C B D A C pqqp p sends q sends p receives A D qp C2C2 C1C1 A B send receive CD send receive A
31 Event e in process p is an atomic action: can change the state of p, and a state of at most one channel c incident on p (by sending/receiving message M along c ) e is defined by e = may occur in global state S if 1. the state of p in S is s. 2 a. if c is directed towards p: c ’s state has M in its head, and is deleted after applying e. b. if c is directed from p: c ’s state has M in its tail after applying e. 3. the state of p after applying e is s’.
32 Process State and Global State A process: set of states, an initial state set of events A global state S : collection of process states and channel states initially, each process is in its initial state and all channels are empty next(S, e) is the global state after event e in applied to global state S
33 Process State and Global State seq = (e i : i = 0…n) is a computation of the system iff e i may occur in S i, S i+1 = next(S i, e i ) (S 0 is the initial global state)
34 seq = (e i : i ≥ 0) a distributed computation S i – the state of the system right before e i occurs S 0 – the initial state of the system S t – the state of the system at the termination of the algorithm S* - the recorded global state The Recorded Global State
35 Definition Event e j is called pre-recording if e j is in a process p and p records its state after e j in seq. Event e j is called post-recording if e j is in a process p and p records its state before e j in seq. Assume that e j-1 is a post-recording event before Pre-recording event e j in seq. pre-recording post-recording
36 Lemma: Proof: e j-1 occurs in p and e j in q, and q ≠p (since e j-1 is and e j is.)
37 The only scenario that might prevent interchanging the two events is that a message M is sent at e j-1 and received at e j. but this cannot be possible: if M is sent at e j-1, then M is, so a marker was sent to q before M, so when it is received in e j q already recorded its state, so e j Is,a, a contradiction.
38 Hence, event e j can occur in global state S j-1. The state of process p is not altered by e j, hence e j-1 can occur after e j.
39 We have to show that the states of all Processes and channels are the same in S 2 and S 4. This clearly holds for proceses and channels That do not take part in ej-1 and ej.
40 states: the states of p and q in S2 and in S4 are the same. channels: whether ej-1/ej send/receive(/neither) a message along a channel, the same is done in both scenarios, So the states of the channels in S 2 and S 4 are the same. (End of proof. )
42 Proof Using the lemma, swap the events till all events appear after all events. The acquired computation is seq’. All that is left to show: S* is a global state after all events and before all events. 1.Process states 2.Channel states
43 Claim: The state of a channel in S* is (sequence of messages corresp. to pre-recorded receives)-(sequence of messages corresp. to prerecorded sends) Proof: The state of channel c from process p to process q recorded in S* is the sequence of messages received on c by q after q records its state and before q receives a marker on c. The sequence of messages sent by p is the sequence corres. to prerecording sends on c.
44 A D B C D A C pq q p p sends q sends p receives A D B post pre post qp C2C2 C1C1 A B send receive CD send receive
45 A D A D D A C p q q p q sends p sends p receives A D A (Another execution) pre post B qp C2C2 C1C1 A B send receive CD send receive
What did we get? A configuration that could have happened 46
seq = (e i : i ≥ 0) a distributed computation S i – the state of the system right before e i occurs S 0 – the initial state of the system S t – the state of the system at the termination of the algorithm S* - the recorded global state 47
Stable Detection D - distributed system y - a predicate function defined on the set of global states of D S, S’ – global states of D y is a stable property of D if y(S) implies y(S’) for all S’ reachable from S 48
49 Input: A stable property y Output: a boolean value b with the property: y(S 0 ) b and b y(S t ) Algorithm Algorithm: begin record a global state S* b := y(S*) end
50 Correctness 1. S* is reachable from S 0 2. S t is reachable from S* 3. y(S) y(S’) for all S’ reachable from S S 0 S* S t y(S*)=true y(S t )=true y(S*)=false y(S 0 )=false
References K. M. Chandy and L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems 51