CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

Slides:



Advertisements
Similar presentations
Global States.
Advertisements

From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 14: Time and.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Computing 5. Snapshot Shmuel Zaks ©
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Slides for Chapter 10: Time and Global State
Computer Science Lecture 11, page 1 CS677: Distributed OS Last Class: Clock Synchronization Logical clocks Vector clocks Global state.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.
Cloud Computing Concepts
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Lecture 3-1 Computer Science 425 Distributed Systems Lecture 3 Logical Clock and Global States/ Snapshots Reading: Chapter 11.4&11.5 Klara Nahrstedt.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Computing 5. Snapshot Shmuel Zaks ©
1 Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526.
Chapter 9 Global Snapshot. Global state  A set of local states that are concurrent with each other Concurrent states: no two states have a happened before.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
CSE 486/586 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems Fall 2010 Logical time, global states, and debugging.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
1 Chapter 11 Global Properties (Distributed Termination)
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems Lecture 6 Global states and snapshots 1.
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Consistent cut A cut is a set of events.
Lecture 3: State, Detection
CSE 486/586 Distributed Systems Global States
Theoretical Foundations
CSE 486/586 Distributed Systems Logical Time
Distributed Snapshot.
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Slides for Chapter 14: Time and Global States
Time And Global Clocks CMPT 431.
Chapter 5 (through section 5.4)
Slides for Chapter 11: Time and Global State
Slides for Chapter 14: Time and Global States
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
CSE 486/586 Distributed Systems Global States
Jenhui Chen Office number:
CIS825 Lecture 5 1.
CSE 486/586 Distributed Systems Reliable Multicast --- 2
Consistent cut If this is not true, then the cut is inconsistent
Slides for Chapter 14: Time and Global States
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Chandy-Lamport Example
Distributed Snapshot.
Presentation transcript:

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo

CSE 486/586, Spring 2013 Last Time Ordering of events –Many applications need it, e.g., collaborative editing, distributed storage, etc. Logical time –Lamport clock: single counter –Vector clock: one counter per process –Happens-before relation shows causality of events 2

CSE 486/586, Spring 2013 Today’s Question Example question: who has the most friends on Facebook? Challenges to answering this question? –It changes! What do we need? –A snapshot of the social network graph at a particular time 3

CSE 486/586, Spring 2013 Today’s Question Distributed debugging How do you debug this? –Log in to one machine and see what happens –Collect logs and see what happens –Taking a global snapshot! 4 P0P0P1P2 Deadlock! Both waiting…

CSE 486/586, Spring 2013 What Do We Want? Would you say this is a good snapshot? –No because e 2 1 might have been caused by e 3 1. Three things we want. –Per-process state –Messages in flight –All events that happened before each event in the snapshot 5 P1 P2 P3 e10e10 e11e11 e12e12 e13e13 e20e20 e21e21 e22e22 e30e30 e31e31 e32e32 A “cut”

CSE 486/586, Spring 2013 Obvious First Try Synchronize clocks of all processes –Ask all processes to record their states at known time t Problems? –Time synchronization possible only approximately –Another issue? –Does not record the state of messages in the channels Again: synchronization not required – causality is enough! What we need: logical global snapshot –The state of each process –Messages in transit in all communication channels 6 P0P0P1P2 msg

CSE 486/586, Spring 2013 How to Do It? Definitions For a process P i, where events e i 0, e i 1, … occur, history(P i ) = h i = prefix history(P i k ) = h i k = S i k : P i ’s state immediately after k th event For a set of processes P 1, …,P i, …. : Global history: H =  i (h i ) Global state: S =  i (S i k i ) A cut C  H = h 1 c1  h 2 c2  …  h n cn The frontier of C = {e i ci, i = 1,2, … n} 7 P1 P2 P3 e10e10 e11e11 e12e12 e13e13 e20e20 e21e21 e22e22 e30e30 e31e31 e32e32

CSE 486/586, Spring 2013 Consistent States A cut C is consistent if and only if  e  C (if f  e then f  C) A global state S is consistent if and only if it corresponds to a consistent cut 8 P1 P2 P3 e10e10 e11e11 e12e12 e13e13 e20e20 e21e21 e22e22 e30e30 e31e31 e32e32 Inconsistent cut Consistent cut

CSE 486/586, Spring 2013 Why Consistent States? #1: For each event, you can trace back the causality. #2: Back to the state machine (from the last lecture) –The execution of a distributed system as a series of transitions between global states: S0  S1  S2  … –…where each transition happens with one single action from a process (i.e., local process event, send, and receive) –Each state (S0, S1, S2, …) is a consistent state. 9

CSE 486/586, Spring 2013 CSE 486/586 Administrivia PA2 is out. –Please start from the content provider. Amazon EC2 is ready. –I will send out instructions regarding this soon. –We will give you two codes, $50/code. 10

CSE 486/586, Spring 2013 The “Snapshot” Algorithm Assumptions: There is a communication channel between each pair of processes process: N-1 in and N-1 out) Communication channels are unidirectional and FIFO- ordered No failure, all messages arrive intact, exactly once Any process may initiate the snapshot Snapshot does not interfere with normal execution Each process is able to record its state and the state of its incoming channels (no central collection) 11

CSE 486/586, Spring 2013 The “Snapshot” Algorithm Goal: records a set of process and channel states such that the combination is a consistent global state. Two questions: –#1: When to take a local snapshot at each process so that the collection of them can form a consistent global state? –#2: When to capture messages in flight sent before each local snapshot? Brief answer for #1 –The initiator broadcasts a “marker” message to everyone else (“hey, take a local snapshot now”) Brief answer for #2 –If a process receives a marker for the first time, it takes a local snapshot, starts recording all incoming messages, and broadcasts a marker again to everyone else. (“hey, I’ve sent all my messages before my local snapshot to you, so stop recording my messages.”) –A process stops recording, when it receives a marker for each channel. 12

CSE 486/586, Spring 2013 The “Snapshot” Algorithm Basic idea: marker broadcast & recording –The initiator broadcasts a “marker” message to everyone else (“hey, take a local snapshot now”) –If a process receives a marker for the first time, it takes a local snapshot, starts recording all incoming messages, and broadcasts a marker again to everyone else. (“hey, I’ve sent all my messages before my local snapshot to you, so stop recording my messages.”) –A process stops recording for each channel, when it receives a marker for that channel. 13 P1 P2 P3 a b M M M M M M

CSE 486/586, Spring 2013 The “Snapshot” Algorithm 1. Marker sending rule for initiator process P 0 After P 0 has recorded its own state for each outgoing channel C, send a marker message on C 2. Marker receiving rule for a process P k on receipt of a marker over channel C if P k has not yet recorded its own state record P k ’s own state record the state of C as “empty” for each outgoing channel C, send a marker on C turn on recording of messages over other incoming channels else record the state of C as all the messages received over C since P k saved its own state; stop recording state of C 14

CSE 486/586, Spring 2013 Chandy and Lamport’s Snapshot 15 Marker receiving rule for process p i On p i ’s receipt of a marker message over channel c: if (p i has not yet recorded its state) it records its process state now; records the state of c as the empty set; turns on recording of messages arriving over other incoming channels; else p i records the state of c as the set of messages it has received over c since it saved its state. end if Marker sending rule for process p i After p i has recorded its state, for each outgoing channel c: p i sends one marker message over c (before it sends any other message over c).

CSE 486/586, Spring 2013 Exercise 16 P1 P2 P3 e10e10 e20e20 e23e23 e30e30 e13e13 a b M e 1 1,2 M 1- P1 initiates snapshot: records its state (S1); sends Markers to P2 & P3; turns on recording for channels C21 and C31 e 2 1,2,3 M M 2- P2 receives Marker over C12, records its state (S2), sets state(C12) = {} sends Marker to P1 & P3; turns on recording for channel C32 e14e14 3- P1 receives Marker over C21, sets state(C21) = {a} e 3 2,3,4 M M 4- P3 receives Marker over C13, records its state (S3), sets state(C13) = {} sends Marker to P1 & P2; turns on recording for channel C23 e24e24 5- P2 receives Marker over C32, sets state(C32) = {b} e31e31 6- P3 receives Marker over C23, sets state(C23) = {} e13e13 7- P1 receives Marker over C31, sets state(C31) = {}

CSE 486/586, Spring 2013 One Provable Property The snapshot algorithm gives a consistent cut Meaning, –Suppose e i is an event in P i, and e j is an event in P j –If e i  e j, and e j is in the cut, then e i is also in the cut. Proof sketch: proof by contradiction –Suppose e j is in the cut, but e i is not. –Since e i  e j, there must be a sequence M of messages that leads to the relation. –Since e i is not in the cut (our assumption), a marker should’ve been sent before e i, and also before all of M. –Then P j must’ve recorded a state before e j, meaning, e j is not in the cut. (Contradiction) 17

CSE 486/586, Spring 2013 Another Provable Property Can we evaluate a stable predicate? –Predicate: a function: (a global state)  {true, false} –Stable predicate: once it’s true, it stays true the rest of the execution, e.g., a deadlock. A stable predicate that is true in S-snap must also be true in S-final –S-snap: the recorded global state –S-final: the global state immediately after the final state- recording action. Proof sketch –The necessity for a proof: S-snap is a snapshot that may or may not correspond to a snapshot from the real execution. –Strategy: prove that it’s part of what could have happened. –Take the actual execution as a linearization –Re-order the events to get another linearization that passes through S-snap. 18

CSE 486/586, Spring 2013 Related Properties Liveness (of a predicate): guarantee that something good will happen eventually –For any linearization starting from the initial state, there is a reachable state where the predicate becomes true. –“Guarantee of termination” is a liveness property Safety (of a predicate): guarantee that something bad will never happen –For any state reachable from the initial state, the predicate is false. –Deadlock avoidance algorithms provide safety Liveness and safety are used in many other CS contexts. 19

CSE 486/586, Spring 2013 Summary Global states –A union of all process states –Consistent global state vs. inconsistent global state The “snapshot” algorithm Take a snapshot of the local state Broadcast a “marker” msg to tell other processes to record Start recording all msgs coming in for each channel until receiving a “marker” Outcome: a consistent global state 20

CSE 486/586, Spring Acknowledgements These slides contain material developed and copyrighted by Indranil Gupta at UIUC.