Uncoordinated Checkpointing The Global State Recording Algorithm.

Slides:



Advertisements
Similar presentations
Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Advertisements

Global States.
Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
Global States in a Distributed System By John Kor and Yvonne Cheng.
Global States and Checkpoints
Distributed Computing 5. Snapshot Shmuel Zaks ©
Scalable Algorithms for Global Snapshots in Distributed Systems
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
1 Global State $500$200 A B C1: Empty C2: Empty Global State 1 $450$200 A B C1: Tx $50 C2: Empty Global State 2 $450$250 A B C1: Empty C2: Empty Global.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Termination Detection of Diffusing Computations Chapter 19 Distributed Algorithms by Nancy Lynch Presented by Jamie Payton Oct. 3, 2003.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Computing 5. Snapshot Shmuel Zaks ©
OSU CIS Lazy Snapshots Nigamanth Sridhar and Paul A.G. Sivilotti Computer and Information Science The Ohio State University
Termination Detection. Goal Study the development of a protocol for termination detection with the help of invariants.
Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Computer Science Lecture 11, page 1 CS677: Distributed OS Last Class: Clock Synchronization Logical clocks Vector clocks Global state.
Chapter 11 Detecting Termination and Deadlocks. Motivation – Diffusing computation Started by a special process, the environment environment sends messages.
Ordering and Consistent Cuts Presented by Chi H. Ho.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Cloud Computing Concepts
Chapter 10 Global Properties. Unstable Predicate Detection A predicate is stable if, once it becomes true it remains true Snapshot algorithm is not useful.
1 Distributed Process Management: Distributed Global States and Distributed Mutual Exclusion.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms CS 249 Project Fall 2005 Wing Wong.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Chapter 9 Global Snapshot. Global state  A set of local states that are concurrent with each other Concurrent states: no two states have a happened before.
Diffusing Computation. Using Spanning Tree Construction for Solving Leader Election Root is the leader In the presence of faults, –There may be multiple.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
D ISTRIBUTED S YSTEM UNIT-2 Theoretical Foundation for Distributed Systems Prepared By: G.S.Mishra.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Chapter 11 Global Properties (Distributed Termination)
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Token-passing Algorithms Suzuki-Kasami algorithm The Main idea Completely connected network of processes There is one token in the network. The holder.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Global state and snapshot
Lecture 3: State, Detection
Global state and snapshot
Lecture 3: State, Detection
Theoretical Foundations
Distributed Snapshots & Termination detection
Distributed Snapshot.
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Logical Clocks and Casual Ordering
Time And Global Clocks CMPT 431.
Distributed Snapshot Distributed Systems.
Uncoordinated Checkpointing
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
Distributed algorithms
CIS825 Lecture 5 1.
Chandy-Lamport Example
Distributed Snapshot.
Presentation transcript:

Uncoordinated Checkpointing The Global State Recording Algorithm

Global State Recording CS 5204 – Operating Systems2 The Model Node properties No shared memory No global clock node channel Channel properties: FIFO loss free non­duplicating

Global State Recording CS 5204 – Operating Systems3 The Problem $500 $200 C1:empty C2:empty $450 $200 C1:transfer $50 C2:empty $450 $250 C1:empty C2:empty

Global State Recording CS 5204 – Operating Systems4 Motivation for recording a “consistent” state of the global computation: checkpointing for fault tolerance (rollback, recovery) testing and debugging monitoring and auditing Method: detecting stable properties in a distributed system via snapshots. A property is “stable” if, once it holds in a state, it holds in all subsequent states. termination deadlock garbage collection Distributed Snapshot (Global State Recording)

Global State Recording CS 5204 – Operating Systems5 Local State and Actions: local state: LS i message send: send(m ij ) message receive: rec(m ij ) time: time(x) send(m ij )  LS i iff time(send(m ij )) < time(LS i ) rec(m ij )  LS j iff time(rec(m ij )) < time(LS j ) Predicates: transit(LS i, LS j ) = {m ij | send(m ij )  LS i  !( rec(m ij )  LS j ) ) } inconsistent(LS i, LS j ) = {m ij | !(send(m ij )  LS i )  rec(m ij )  LS j ) } Consistent Global State:  i,  j : 1 <= i, j <= n :: inconsistent( LS i, LS j ) =  Definitions

Global State Recording CS 5204 – Operating Systems6 Marker­Sending Rule for a Process p: for (each channel c, incident on, and directed away from p) { p sends one marker along c after p records its state and before p sends further messages along c; } Marker­Receiving Rule for a Process q: if (q has not recorded its state) then { q records its state; q records the state of c as the empty sequence; } else { q records the state of c as the sequence of message received along c after q's state was recorded and before q received the marker along c. } Global­State­Recording Algorithm

Global State Recording CS 5204 – Operating Systems7 p empty q state A state C S0S0 p records its state (A) and sends marker M on channel p M empty q state A state C S1S1 before receiving the marker, q changes its state and sends message D. p M D q state A state D S2S2 q receives the marker and records its state (D) and the incoming channel as empty; q send marker M' on its outgoing channel. p empty M’ q state B state D S3S3 on receiving the marker, p records the channel as having message D p empty D q state A state D recorded state

Global State Recording CS 5204 – Operating Systems8 Snapshot/State Recording Example M = Marker p 500 q r c3 c4 c2 c1

Global State Recording CS 5204 – Operating Systems9 Snapshot/State Recording Example (Step 1) p 490 q r c3 c4 c2 c1 M

Global State Recording CS 5204 – Operating Systems10 Snapshot/State Recording Example (Step 2) p 490 q r c3 c4 c2 c M M 25

Global State Recording CS 5204 – Operating Systems11 Snapshot/State Recording Example (Step 3) p 470 q r c3 c4 c2 c1 20 M M 25

Global State Recording CS 5204 – Operating Systems12 Snapshot/State Recording Example (Step 4) p 490 q r c3 c4 c2 c1 M 25

Global State Recording CS 5204 – Operating Systems13 Snapshot/State Recording Example (Step 5) 485 p 515 q r 500 c3 c4 c2 c1