5 Goal: design a snapshot (=global-state- detection) algorithm that: will record a collection of states of all system components (which forms a global system state), will not change the underlying computation, will not freeze the underlying computation
6 A Process Can… record its own state, send and receive messages, record messages it sends and receives, cooperate with other processes Processes do not share clocks or memory Processes cannot record their state precisely at the same instant
7 Motivation Many problems in distributed systems can be stated in terms of the problem of detecting global states: Stable property detection problems : termination detection, deadlock detection etc. Checkpointing
8 Stable Property Detection Problem D - distributed system y - a predicate function defined on the set of global states of D S, S’ – global states of D y is stable if y(S) implies y(S’) for all S’ reachable from S
many distributed algorithms are structured as a sequence of phases A phase: transient part, then a stable part phase termination vs. computation termination our view on the problem: i.detect the termination of a phase ii.initiate a new phase Notice that “the kth phase has terminated” is a stable property 9
10 Model Distributed system D is a finite, labeled, directed graph. p q C2 C1 Channels have infinite buffers, are error- free and preserve FIFO Message delay is bounded, but unknown
11 State of a Channel 1 p q C1 23 1 [1, 2, 3] – sequence X of messages that were sent  – sequence Y of received messages ( prefix of X ) [2, 3] – state of C1: X \ Y pq C2 C1
12 Example: System Distributed system: p C2C2 C1C1 Initial global state: B A Ø Ø State transitions (same for p and q): A B send receive q
13 A A Ø A A Ø A B Ø Ø B A Ø Ø A computation corresponds to a path in the diagram p qq p p sends q receives q sends p receives q sends C1C1 p C2C2 q deterministic A B send receive Global state transition diagram
14 Distributed system: State transition: p : q : CD send receive A B send receive p C2C2 C1C1 q Example: System
15 qp C2C2 C1C1 A D Ø B C Ø B D A C Ø Ø p qq p p sends q sends p receives Global state transition diagram q receives non-deterministic q sends A B send receive CD send receive q receives
16 qp C2C2 C1C1 A D Ø B C Ø B D A C Ø Ø p qq p p sends q sends p receives We look at the following sequence of events: A B send receive CD send receive
17 Each process records its own state p and q cooperate to record the state of C. p C q in the snapshot algorithm:
18 B A Ø p q Example: System A A A A Recorded state: p C q Ø No token C1C1 p C2C2 q A B send receive Record C Record q Record p
19 B A Ø Ø p q Example: System B A A A Ø Recorded state: p C1C1 q Two tokens Record p Record C Record q C1C1 p C2C2 q A B send receive
C’s state recorded time P sends a message on C P’s state recorded C’s state recorded P sends a message on C P’s state recorded 20 Record p Record C Record q Record C Record q Record p
21 q will record the state of C q starts recording C after it records its state p C q p and q have to coordinate ; using a special marker q stops when receiving from p But: how does q know when to record its state?
22 Who starts? We assume one process. The snapshot algorithm Hw: extend discussion + proof to any number of startes.
Who will record the state of channel C? q How q knows when to stop recording? p sends right after it records its state, and before sending any other message q starts recording after it records its state (Intuition for the Algorithm) p C q 23
24 The snapshot algorithm Ends when q receives along C Starts when q records itself channel recording p C q Note : for any q p 0, the channel along which arrived first is recorded as
25 p 0 starts. The snapshot algorithm p 0 recoreds its state, and then broadcasts. Shout-algorithm = PI (Propogation-of-information)= hot potato = … When q receives for the first time, it records its own state State recording
26 1. record the state of p 2. send along c before sending any other message Marker-Receiving Rule for a process q if q’s state is not recorded: 1. record state; 2. record c’s state = ; else: c’s state is the sequence of messages received since q recorded its state The snapshot algorithm on receiving along channel c: Marker-Sending Rule for a process q
Termination Assumption No marker remains forever in an input channel Claim: If the graph is strongly connected and at least one process records its state, then all processes will record their state in finite time Proof: by induction 27
28 The Recorded Global State State transition: p : q : C D send receive A B send receive p C2C2 C1C1 q Ex: System
29 A D B C B D A C pqqp p sends q sends p receives A D qp C2C2 C1C1 A B send receive CD send receive A
31 Event e in process p is an atomic action: can change the state of p, and a state of at most one channel c incident on p (by sending/receiving message M along c ) e is defined by e = may occur in global state S if 1. the state of p in S is s 2. if c is directed towards p then c ’s state has M in its head
32 Process State and Global State A process: set of states, an initial state set of events A global state S : collection of process states and channel states initially, each process is in its initial state and all channels are empty next(S, e) is the global state after event e in applied to global state S
33 Process State and Global State seq = (e i : i = 0…n) is a computation of the system iff e i may occur in S i, S i+1 = next(S i, e i ) (S 0 is the initial global state)
34 seq = (e i : i ≥ 0) a distributed computation S i – the state of the system right before e i occurs S 0 – the initial state of the system S t – the state of the system at the termination of the algorithm S* - the recorded global state The Recorded Global State
35 Definition Event e j is called pre-recording if e j is in a process p and p records its state after e j in seq. Event e j is called post-recording if e j is in a process p and p records its state before e j in seq. Assume that e j-1 is a post-recording event before Pre-recording event e j in seq.
36 Claim: Sequence obtained by interchanging e j-1 and e j is a computation. Proof: e j-1 occurs in p and e j in q (other than p). There cannot be a message sent at e j-1 and received at e j. Hence, event e j can occur in global state S j-1. The state of process p is not altered by e j, hence e j-1 can occur after e j.
37 Proof Swap the events till all post-recorded events appear after all pre-recorded events. The acquired computation is seq’. All that is left to show: S* is a global state after all prerecorded events and before all postrecorded events. 1.Process states 2.Channel states
38 Claim: The state of a channel in S* is (sequence of messages corresp. to pre-recorded receives)(sequence of messages corresp. to prerecorded sends) Proof: The state of channel c from process p to process q recorded in S* is the sequence of messages received on c by q after q records its state and before q receives a marker on c. The sequence of messages sent by p is the sequence corres. to prerecording sends on c.
39 A D B C D A C pq q p p sends q sends p receives A D B post pre post qp C2C2 C1C1 A B send receive CD send receive
40 A D A D D A C p q q p q sends p sends p receives A D A (Another execution) pre post B qp C2C2 C1C1 A B send receive CD send receive
What did we get? A configuration that could have happened 41
seq = (e i : i ≥ 0) a distributed computation S i – the state of the system right before e i occurs S 0 – the initial state of the system S t – the state of the system at the termination of the algorithm S* - the recorded global state (The Recorded Global State) 42
Stable Detection D - distributed system y - a predicate function defined on the set of global states of D S, S’ – global states of D y is a stable property of D if y(S) implies y(S’) for all S’ reachable from S 44
45 Input: A stable property y Output: a boolean value b with the property: y(S 0 ) b and b y(S t ) Algorithm Algorithm: begin record a global state S* b := y(S*) end
46 Correctness 1. S* is reachable from S 0 2. S t is reachable from S* 3. y(S) y(S’) for all S’ reachable from S S 0 S* S t y(S*)=true y(S t )=true y(S*)=false y(S 0 )=false
References K. M. Chandy and L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems 47