OSU CIS Lazy Snapshots Nigamanth Sridhar and Paul A.G. Sivilotti Computer and Information Science The Ohio State University

Slides:



Advertisements
Similar presentations
Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Advertisements

Global States.
Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 14: Time and.
Global States in a Distributed System By John Kor and Yvonne Cheng.
Lecture 8: Asynchronous Network Algorithms
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,
1 Global State $500$200 A B C1: Empty C2: Empty Global State 1 $450$200 A B C1: Tx $50 C2: Empty Global State 2 $450$250 A B C1: Empty C2: Empty Global.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Termination Detection of Diffusing Computations Chapter 19 Distributed Algorithms by Nancy Lynch Presented by Jamie Payton Oct. 3, 2003.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
Distributed Snapshot (continued)
S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to.
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Slides for Chapter 10: Time and Global State
Computer Science Lecture 11, page 1 CS677: Distributed OS Last Class: Clock Synchronization Logical clocks Vector clocks Global state.
Ordering and Consistent Cuts
Ordering and Consistent Cuts Presented by Chi H. Ho.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Cloud Computing Concepts
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
1 Chapter 11 Global Properties (Distributed Termination)
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
Distributed Systems Lecture 6 Global states and snapshots 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Global state and snapshot
Consistent cut A cut is a set of events.
Global state and snapshot
Lecture 3: State, Detection
CSE 486/586 Distributed Systems Global States
Distributed Snapshot.
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Distributed Snapshot Distributed Systems.
Uncoordinated Checkpointing
Slides for Chapter 11: Time and Global State
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
CSE 486/586 Distributed Systems Global States
Jenhui Chen Office number:
CIS825 Lecture 5 1.
Consistent cut If this is not true, then the cut is inconsistent
Slides for Chapter 14: Time and Global States
Chandy-Lamport Example
Presentation transcript:

OSU CIS Lazy Snapshots Nigamanth Sridhar and Paul A.G. Sivilotti Computer and Information Science The Ohio State University

OSU CIS Outline Global state Inequality characterization of marker- based approach Lazy snapshot algorithm –Some specializations Conclusion

OSU CIS Distributed Systems Finite set of processes and a finite set of FIFO channels No globally shared memory or clock Process communication is via message passing Described by a directed graph –The nodes represent processes; edges represent channels

OSU CIS Global State Union of the local states of the processes, as well as the states of the channels Since there is no sharing of memory between the processes, the global state has to be detected by all the processes cooperating in some way A global snapshot is the state of the entire system at a particular point in time –state of each process –state of each channel (messages in transit)

OSU CIS Consistent Cut Meaningful global state Every message recorded as received has also been recorded as sent –No orphan messages p q r m1m1 m3m3 m2m2 m4m4 m5m5

OSU CIS Inconsistent Cut Global state is meaningless System could never be in such a state Channels may include orphan messages p q r m1m1 m3m3 m2m2 m4m4 m5m5

OSU CIS Marker Approach to Snapshots Marker messages are used to distinguish events before and after the local snapshot in each process –Marker messages signal when a process should take its local snapshot Union of all these local snapshots yields global snapshot Marker messages must be sent so that resultant cut is consistent –Ordering of marker messages should rule out orphan messages

OSU CIS Marker Algorithm Desiderata Safety: The state gathered is consistent –Every message recorded as received must be recorded as sent –Every message recorded as in transit must be recorded as sent Progress –The algorithm must terminate to yield a global snapshot

OSU CIS Outline Global state Inequality characterization of marker-based approach Lazy snapshot algorithm –Some specializations Conclusion

OSU CIS Some Terms p.RLS: process p records its local state p.SM(q): process p sends marker to q p.RM(q): process p receives marker from q p.RD(q): process p receives a message from q after receipt of marker from q (on a dirty channel) p.US(q): process p sends a message to q after its local snapshot (unrecorded send) p.LMR(q): last message sent by process p to q before its local snapshot (last recorded send)

OSU CIS Characterization of Marker Algorithm L1. (  p :: p.RLS  (Min q :: p.RD(q))) Process p must record its local state before the first message along a dirty channel is received p q q.RLS q.SM(p) q.US(q) p.RM(q) p.RD(q)

OSU CIS Characterization of Marker Algorithm (contd.) L2. (  p, q :: p.LMR(q)  p.SM(q)  p.US(q) ) Process p must send a marker along each of its outgoing channels before sending any unrecorded messages along that channel but not before the last recorded message q p p.LMR(q) p.US(q) p.SM(q)

OSU CIS Outline Global state Inequality characterization of marker- based approach Lazy snapshot algorithm –Some specializations Conclusion

OSU CIS Lazy Snapshots: A Marker Algorithm Marker Sending Rule for process p. For each outgoing channel C, p sends one marker along C, in accordance with L2 Marker Receiving Rule for process q. On receiving a marker along channel C, mark C as dirty; If q has not recorded its local state q records state of C as empty Else q records the state of C as the sequence of messages received along C upto this point after q recorded its local state State Recording Rule for process p. Process p records its state before receiving any messages along a dirty channel (L1)

OSU CIS Specializing Lazy Snapshots The inequalities L1 and L2 characterize a class of algorithms that gather global state in a distributed system Depending on the application, the level of “laziness” can be varied –Processes have flexibility in scheduling their local snapshot

OSU CIS Chandy-Lamport Algorithm Local state recording is tightly coupled to marker receiving –Process records local state immediately upon receiving first marker –Markers are sent out from a process after local snapshot Constrains flexibility, but easy to prove correctness

OSU CIS Piggybacking Algorithm In this scheme, marker messages are not sent separately Messages in the underlying computation are augmented with marker information –Each message carries with it information about whether it is a “before” message or “after” message Extreme case of laziness –Local snapshot is postponed as much as possible

OSU CIS Conclusions The new characterization captures an entire class of marker algorithms A generalized lazy snapshot algorithm Applications can choose the level of laziness

OSU CIS Questions? Author Contact: Nigamanth Sridhar and Paul Sivilotti Computer and Information Science The Ohio State University 2015 Neil Ave Columbus OH

OSU CIS

Chandy-Lamport Marker Algorithm Markers used to distinguish events that happened before and after the snapshot Algorithm outline –Initiator sends out markers to all its neighbors –Each process, on receiving its first marker, »takes its local snapshot »Sends markers on all its outgoing channels –Each process, on receiving each subsequent marker »Updates the channel state to include messages between markers

OSU CIS Marker Algorithm Properties No message received at a process p after the first marker is included in p’s local state Each subsequent marker causes p to update the state of the channel on which the marker was received In a high-traffic system, this could mean inefficiency of system execution

OSU CIS Global State Detection using the Chandy-Lamport Algorithm Process q need not have taken its local snapshot when its first marker arrived p q r

OSU CIS An Optimization The safety spec does not mandate recording local state immediately upon receipt of the first marker The recording of local state can be postponed as long as no orphan messages are included in the snapshot

OSU CIS Lazy Snapshots On receiving a marker from q, process p –“remembers” the marker (marks the channel dirty) –sends markers along all outgoing channels –postpones the recording of its local state Local state recording can be postponed as long as p does not receive a message along a dirty channel If a process p has received markers along all its incoming channels and has still not taken its local snapshot, it is done now

OSU CIS Lazy Snapshots: Advantages The number of “in-transit” messages in the global state is reduced Processes have flexibility in choosing when to schedule the recording of local state

OSU CIS New Characterization of Marker Algorithm Process p must record its local state before, or at the latest, at the time of receiving its first marker E1. (  p :: p.RLS  (Min q :: p.RM(q))) Local snapshot can occur Anywhere here ( p.RLS ) p.RM(q)

OSU CIS New Characterization of Marker Algorithm Process p must send a marker along each of its outgoing channels after recording its local state and before sending any messages along that channel E2. (  p, q :: p.RLS  p.SM(q)  p.US(q)) p.RLS p.SM(q)p.US(q) p q q.RLS

OSU CIS Proof of Correctness Safety –S1. Every message recorded as received has been recorded as sent –S2. Every message recorded as in transit has been recorded as sent Progress –Every process takes its local snapshot