Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt.

Similar presentations


Presentation on theme: "Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt."— Presentation transcript:

1 Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt

2 Uncoordinated Processes take chkpnt independently Domino Effect! Next: Coordinated Blocking Chkpnt

3 Coordinated Blocking Processes are coordinated to form a consistent global state, and … initiator Ready!Go! p1 p2 p3 * * * * okay, channels flushed Next: Coordinated Blocking Chkpnt (cont ’ )

4 Coordinated Blocking (cont’) Advantage Always consistent No Domino Effect Less storage overhead Disadvantage Large latency to chkpnt! Next: Coordinated Non-blocking Chkpnt

5 Coordinated Non-blocking Processes are coordinated, but … Do we really need to block …? ! ! K. Mani Chandy Leslie Lamport Next: Global-state Recording Algorithm

6 Global-state Recording Alg. Step 1: process states Step 2: channel states Step 3: end of the algorithm “Distributed snapshots: determining global states of distributed systems”, K. Mani Chandy and Leslie Lamport Next: Model of Distributed System

7 Model of Distributed System Processes Channels: directed, FIFO, error-free pq r c1 c2 c3 c4 Next: Step 1, process states

8 Step 1: process states Initiator: Save its local state Send marker tokens on all outgoing edges All other processes: On receiving the first marker on any incoming edges,  Save state, and propagate markers on all outgoing edges  Resume execution. Further markers will be eaten up. Next: Example

9 Example pq r c1 c2 c3 c4 initiator p q r marker checkpoint x x x x x Next: Proof

10 Proof pq x x x x x p q Let us assume that a message m exists, and it makes our cut inconsistent. m Next: Proof (cont ’ )

11 Proof(cont’) pq x x x1 x2 x p q m x1 p q m [Incomplete page] Contradict the assumption. x2 (2) x1 is not the 1 st marker for process q (1)x1 is the 1 st marker for process q Next: Step 2, channel states

12 Step 2: channel states p q Sent along the channel before the sender ’ s chkpnt Received along the channel after the receiver ’ s chkpnt In-flight messages Next: Example

13 Example p x x x q r s t u 1 2 3 4 5 6 7 8 (1) p is receiving messages p x x x q r s t u 4 5 6 7 8 (2) p has just saved its state x Next: Example (cont ’ )

14 Example(cont’) p q r s p x x x q r s t u 1 2 3 4 5 6 7 8 p ’ s chkpnt triggered by a marker from q x x x x x x x 1 2 3 4 5 6 7 8 Next: Algorithm (revised)

15 Algorithm (revised) Initiator: Save its local state Send marker tokens on all outgoing edges All other processes: On receiving the first marker on any incoming edges,  Save state, and propagate markers on all outgoing edges  Resume execution, but also save incoming messages until a marker arrives through the channel  Guarantees a consistent global state! Next: Step 3, end of the algorithm

16 Step 3: end of the algorithm Did every process save its state and in-flight messages? p q r initiator direct channel to the initiator? spanning tree? General solution? Next: References

17 References “Distributed snapshots: determining global States of distributed systems”, K. Mani Chandy and Leslie Lamport


Download ppt "Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt."

Similar presentations


Ads by Google