Global State Recording

Slides:



Advertisements
Similar presentations
Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt.
Advertisements

Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Global States.
Global States in a Distributed System By John Kor and Yvonne Cheng.
Lecture 8: Asynchronous Network Algorithms
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
1 Global State $500$200 A B C1: Empty C2: Empty Global State 1 $450$200 A B C1: Tx $50 C2: Empty Global State 2 $450$250 A B C1: Empty C2: Empty Global.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to.
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
Chapter 5.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Chapter 11 Global Properties (Distributed Termination)
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Chapter 10 Time and Global States
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Global state and snapshot
Consistent cut A cut is a set of events.
Global state and snapshot
CSE 486/586 Distributed Systems Global States
Theoretical Foundations
Distributed Snapshots & Termination detection
Lecture 9: Asynchronous Network Algorithms
Distributed Snapshot.
COT 5611 Operating Systems Design Principles Spring 2012
Global State Recording
EECS 498 Introduction to Distributed Systems Fall 2017
EEC 688/788 Secure and Dependable Computing
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Slides for Chapter 14: Time and Global States
Logical Clocks and Casual Ordering
Outline Theoretical Foundations - continued Lab 1
Time And Global Clocks CMPT 431.
Distributed Snapshot Distributed Systems.
Distributed Systems CS
Chapter 5 (through section 5.4)
Uncoordinated Checkpointing
Slides for Chapter 14: Time and Global States
Outline Theoretical Foundations - continued
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
Chap 5 Distributed Coordination
CSE 486/586 Distributed Systems Global States
Jenhui Chen Office number:
Distributed algorithms
CIS825 Lecture 5 1.
Consistent cut If this is not true, then the cut is inconsistent
COT 5611 Operating Systems Design Principles Spring 2014
Outline Theoretical Foundations
Distributed Snapshot.
Presentation transcript:

Global State Recording definitions global state recording FIFO Chandy-Lamport’s algorithm collecting global state incremental snapshot non-FIFO Lai-Yang two color algorithm Mittern’s vector clocks algorithm consistent global snapshots causality and zigzag paths rollback dependency graph

Local State local state LSi of a site (process) Si is an assignment of values to variables of Si sending send(mij) and receiving rec(mij) of message mij from Si to Sj may influence LSi we denote time(send(mij) or rec(mij)) the sequence number of the state in the compuation after which send/receive occurs (note the difference with Singhal) time(LSi) the state at which Si was recorded to aid the reasoning we consider the messages sent/received by a process as belonging to local state we define that is the message is in transit if it was sent but not received the message is inconsistent if it was received but never sent

Global State global state is a collection of local states of all processes and set of messages in the channels notice Singhal does not use messages in his def. – ours is more precise global state is consistent if it does not have any inconsistent messages, that is: global state is transitless if there are no messages in transition, that is: note that a consistent state is not necessarily transitless and v.v. what are the global states on the picture above?

Chandy-Lamport’s Global State Recording Algorithm 9/10/2018 Chandy-Lamport’s Global State Recording Algorithm works on arbitrary topology system with FIFO channels and arbitrary algorithm whose snapshot is taken (basic algorithm) does not interfere with the operation of basic algorithm (does not delay, reorder or drop basic messages) one process initiates recording by sending control messages (markers) multiple pro- cesses can also initiate draw a tree process example can C-L record an inconsistent state?

Global State Recorded by Chandy-Lamport’s Algorithm does C-L record a (global) state that occurs in the computations? not necessarily. however, C-L records a state in a computation that is equivalent to the original computation. moreover this equivalent computation shares with the original computation a prefix up to start of snapshot a suffix after the snapshot the recorded state is between these two states can C-L record a state where some P have messages in every channel? if yes which one? can several independent snapshots run in parallel?

Collecting Global State based on spanning tree constructed on the fly sender of the first marker to arrive at a process is its parent each marker carries the sender’s parent by receiving marker process learns if it has children if process is a leaf, after finishing state recording, it sends its state to its parent each process waits for its children’s states, appends its own and forwards all info to its parent

Non-FIFO: Lai-Yang Algorithm non-FIFO channel: is a set (rather than a queue) of messages, any message in the set can be received messages can overtake one another fair message receipt is assumed – eventually a sent message is received two colors for processes and basic messages – white and red, no explicit markers all processes start as white, when process sends a basic message it attaches its color when process receives a differently colored message, it itself changes color while white (red), process records all messages sent/received, after changing colors, the process sends message history to the initiator; based on sent/received histories, initiator calculates messages in transit if only the number of messages in transit needed – may maintain counters in stead of histories

State Recording Using Causal Message Delivery initiator broadcast token to all processes each process records state and sends it to the initiator processes do not send markers or coordinate local state recording due to causal ordering of messages if for Pi: rec(tokeni)  send(mij) then send(tokenj)  send(mij) therefore rec(tokenj)  rec(mij) hence state recording at Pj happens before mij receipt channel state recording (Archaya-Badrinath) append sequence numbers to all messages, at each process record highest sent/received SN together with local state send sent/received records for initiator to determine messages in transition

Consistent Global Snapshot processes periodically asynchronously record local states (local checkpoints) a global snapshot is a collection of local checkpoints needed in distributed failure recovery, distributed event monitoring, debugging, etc. global snapshot is consistent if no two checkpoints are causally related even though checkpoints themselves are not causally related, they may not be a part of a global snapshot ex: C11 and C32 are concurrent, yet there is no global snapshot that contains both of them the objective in global snapshot recording is to select (out of available) the set of concurrent checkpoints. note that unlike global state recording, the alg. does not have control over the snapshot taking time

Zigzag Path checkpoint interval – part of computation between two successive checkpoints at the same process zigzag path exists between checkpoints Cxi and Cyj if there exists a sequence of messages m1, …mn such that m1 sent by Px after Cxi mk received by Pz, mk+1 is sent by Pz in the same checkpoint interval mn received by Py before Cyj causal path - same as zigzag path but the messages are causally related zigzag cycle is a zigzag path to the process itself causal path zigzag path

Zigzag Path checkpoint interval – part of computation between two successive checkpoints at the same process zigzag path exists between checkpoints Cxi and Cyj if there exists a sequence of messages m1, …mn such that m1 sent by Px after Cxi mk received by Pz then mk+1 is sent by Pz in the same checkpoint interval mn received by Py before Cyj causal path - same as zigzag path but the messages are causally related zigzag cycle is a zigzag path to the process itself zigzag cycle zigzag path

Sufficient Condition for Consistent Snapshot a consistent snapshot can be formed to include a set S of checkpoints if and only if no zigzag path exists between any two checkpoints in S [Netzer and Xu] snapshot line – a line drawn through a set of checkpoints due to the existence of a zigzag path, a snapshot line always crosses a message making two checkpoints causally related and resultant snapshot inconsistent constructing consistent snapshot requires choosing checkpoints without zigzag path zigzag path

Runtime Consistent Snapshot Construction Using R-Graph definition of rollback-dependency graph (R-graph) [Wang] basic message carries its checkpoint interval number there is a zigzag path between two checkpoints if there is a path in R- graph example computation corresponding R-graph